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(57) Abstract 

A nonlinear model-based predictive temperature control system (100) is described for use in thermal process reactors. A multivariate 
temperature response is predicted using a nonlinear parameterized model of a thermal process reactor. The nonlinear parameterized model is 
implemented using a neural network. Predictions are made in an auto-regressive moving average fashion with a receding prediction horizon. 
Model predictions are incorporated into a control law for estimating the optimum future control strategy. The high-speed, predictive nature 
of the controller (62) renders it advantageous in multivariate rapid thermal processing reactors where fast response and high temperature 
uniformity are needed. 
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AMENDED CLAIMS 

[received by the International Bureau on 31 July 1997 (31.07 97); 
original claims I, 5-8, 24-26 and 31 amended; remaining claims unchanged (5 pages)! 

1 . A temperature controlled thermal process reactor comprising; 
a reaction chamber enclosing an object to be heated; 

a source of thermal energy which heats said object; 
5 a thermal sensor which measures a temperature related to a temperature 

of said object and which provides an input signal representative of said 
temperature; and 

a model-based predictive temperature controller which receives said 
output signal representative of said temperature and which controls said source 
1 0 of thermal energy in response to said output signal. 

2. The temperature controlled thermal process reactor of Claim 1, wherein 
the model-based predictive temperature controller comprises multivariate temperature 
control. 

3. The temperature controlled rapid thermal process reactor of Claim 2, 
15 wherein the model-based predictive temperature cortroller comprises: 

a multivariable thermal process model which relates multivariate 
process input thermal energy to multivariable process output temperature; 

a prediction calculator which uses said thermal process model, to 
calculate a predicted nominal temperature output over a predetermined future 
20 time period; and 

a control calculator which uses said predicted nominal temperature 
output to calculate an optimum control strategy by which to control said source 
of thermal energy. 

4. The temperature controlled thermal process reactor of Claim 3, wherein 
25 said prediction calculator calculates the predicted nominal temperature output using an 

auto-regressive moving average, having a predetermined prediction horizon. 

5. The temperature controlled thermal process reactor of Claim 4, wherein 
the prediction calculator calculates an unoptirnized initial estimate for a future control 
strategy. 

30 6. The temperature controlled thermal process reactor of Claim 5, wherein 

said predicted nominal temperature output is calculated recursively over a 
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predetermined future time period using a recursive approximation strategy, said 
recursive approximation strategy beginning with said unoptimized initial estimate. 

7. The temperature controlled thermal process reactor of Claim 6, wherein 
said thermal process model has parameters selected to substantially decouple the 

5 influence of system input variables from system input disturbance. 

8. The temperature controlled thermal process reactor of Claim 3, wherein 
the control calculator compares said predicted nominal temperature output to a desired 
future temperature output and uses said comparison in a recursive algorithm to compute 
said optimum control strategy. 

10 9. The temperature controlled thermal process reactor of Claim 3, wherein 

said thermal process model is a nonlinear model. 

10. The temperature controlled thermal process reactor of Claim 3, wherein 
said thermal process model is based on a neural network. 

11. The temperature controlled thermal process reactor of Claim i, wherein 
15 the model-based predictive temperature controller comprises nonlinear multi variable 

temperature control. 

12. The temperature controlled rapid thermal process reactor of Claim 11, 
wherein the nonlinear model-based predictive temperature controller comprises: 

a nonlinear multi variable thermal process model which relates 
20 muitivariable process input thermal energy to multivariate process output 

temperature; 

a prediction calculator which uses said thermal process model, to 
calculate a predicted nominal temperature output over a predetermined future 
time period; and 

25 a control calculator which uses said predicted nominal temperature 

output to calculate an optimum control strategy by which to control said source 
of thermal energy. 

13. The temperature controlled thermal process reactor of Claim 12, wherein 
said prediction calculator calculates the predicted nominal temperature output using a 

30 neural network. 

14. The temperature controlled thermal process reactor of Claim 13, wherein 
the prediction calculator assumes a future control strategy. 
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15. The temperature controlled thermal process reactor of Claim 14, wherein 
said neural network is a feed forward network. 

16. The temperature controlled thermal process reactor of Claim 15, wherein 
said neural network comprises a hidden layer of neurons. 

5 17. The temperature controlled thermal process reactor of Claim 16, wherein 

said hidden layer of neurons comprises nonlinear sigmoid-type neurons. 

18. The temperature controlled thermal process reactor of Claim 13, wherein 
said neural network is trained using a pseudo least squares method. 

19. The temperature controlled thermal process reactor of Claim 12, wherein 
10 the control calculator compares said predicted nominal temperature output to a desired 

future temperature output to derive said optimum control strategy. 

20. The temperature controlled thermal process reactor of Claim 1, further 
comprising a softsensor model. 

21 . The temperature controlled thermal process reactor of Claim 20, wherein 
15 said softsensor model is created from a dataset generated by using an instrumented 

wafer. 

22. The temperature controlled thermal process reactor of Claim 1, further 
comprising a setpoint generator, said setpoint generator automatically generating a 
correction to said recipe inputs into said thermal process reactor, said correction 

20 facilitating control of actual wafer surface temperatures. 

23. The temperature controlled thermal process reactor of Claim 22, said 
correction facilitating improved control of actual wafer surface temperatures based on 
measurement of susceptor temperatures, 

24. A temperature control system for controlling a thermal process 
25 comprising; 

a controllable source of thermal energy which heats an object; 

a temperature sensor which measures a temperature related to a 
temperature of said object and which generates an output signal responsive to 
said temperature; and 

30 a model-based predictive temperature controller which receives said 

output signal representative of said temperature and which controls said source 
of thermal energy in response to said output signal, said controller comprising: 
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a thermal process model which relates process input thermal 
energy to a process output temperature; 

a prediction calculator which uses said thermal process model to 
calculate a predicted nominal temperature output over a predetermined 
5 future time period; and 

a control calculator which uses said predicted nominal 
temperature output to calculate an optimum strategy by which to control 
said source of thermal energy, said controller generating output signals to 
said source of thermal energy in response to said optimum strategy. 
10 25. The temperature control system of Claim 24, wherein said thermal 

process model has parameters selected to substantially decouple the influence of system 
input variables from system input disturbances. 

26. The temperature control system of Claim 24, wherein said prediction 
calculator includes a postulated future control strategy and a recursive algorithm to 

1 5 optimize said postulated future control strategy. 

27. The temperature control system of Claim 24, wherein said thermal 
process model is a nonlinear model. 

28. The temperature control system of Claim 27, wherein said thermal 
process model substantially decouples the influence of system input variables from 

20 system input disturbances. 

29. The temperature control system of Claim 27, wherein said prediction 
calculator comprises a neural network. 

30. The temperature control system of Claim 27, wherein said prediction 
calculator comprises a feed forward neural network, said prediction calculator having a 

25 receding calculation horizon. 

31. A method of controlling a thermal process comprising the steps of: 
measuring a process output temperature; 

using a model to predict a future process output temperature; 
using said measured process output temperatures and said predicted 
30 future process temperature to calculate an optimum process input control 

strategy; and 
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controlling a process input thermal energy using the calculated optimum 
process input control strategy. 

32. The method of Claim 31, wherein the step of predicting a future process 
output temperature comprises: 
5 identifying a thermal process model which relates process input thermal 

energy to process output temperature; and 

recursively predicting future process output temperatures using said 
thermal process model, said process output temperature predicted over a 
predetermined future time period. 
10 33. The method of Claim 32, wherein the step of predicting future process 

output temperatures further comprises periodically updating said predictions in 
accordance with a receding horizon calculation. 

34. The method of Claim 31, wherein the step of predicting a future process 
output temperature comprises postulating a stationary future control strategy. 
15 35. The method of Claim 3 1 , wherein the step of calculating an optimum 

process input control strategy comprises comparing said predicted future process output 
temperatures to a desired future process output temperature. 

36. The method of Claim 31, wherein the step of predicting a future process 
output temperature comprises: 
20 identifying a nonlinear thermal process model which relates process 

input thermal energy to process output temperature; and 

training a neural network to predict future process output temperatures 
using said thermal process model, said process output temperature predicted 
over a predetermined future time period. 
25 37. The method of Claim 36, wherein the step of predicting future process 

output temperatures further comprises periodically updating said predictions in 
accordance with a receding horizon calculation. 

38. The method of Claim 36, wherein the step of predicting a future process 
output temperature comprises postulating a stationary future control strategy. 
30 39. The method of Claim 36, wherein the step of calculating an optimum 

process input control strategy comprises comparing said predicted future process output 
temperatures to a desired future process output temperature. 
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(57) Abstract 

A nonlinear model-based predictive temperature control system (100) is described for use in thermal process reactors. A multivariable 
temperature response is predicted using a nonlinear parameterized model of a thermal process reactor. The nonlinear parameterized model is 
implemented using a neural network. Predictions are made in an auto-regressive moving average fashion with a receding prediction horizon. 
Model predictions are incorporated into a control law for estimating the optimum future control strategy. The high-speed, predictive nature 
of the controller (62) renders it advantageous in multivariable rapid thermal processing reactors where fast response and high temperature 
uniformity are needed. 
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MODEL-BASED PREDICTIVE CONTROL OF 
THERMAL PROCESSING 

Packgrqmid of the Invention 

Field of the Invention 

5 The invention relates to automatic feedback control of thermal processing. In 

particular, the invention pertains to model-based predictive temperature control of 
thermal process reactors such as used in semiconductor processing. 
Description of the Related Art 

Until recently, most of the high temperature processing necessary for integrated 

10 circuit fabrication was performed in hot-wall, resistance-heated batch reactors. 
Controlling the wafer temperature uniformity (within-wafer, point-to-point) in these 
reactors was generally not considered an issue, because the reactors were substantially 
isothermal. The down-boat (wafer-to-wafer) temperature uniformity could be controlled 
effectively by dividing the cylindrical heating coil into several zones, each with its own 

15 temperature sensor controller and power supply. The outer zones were typically 
adjusted to compensate for heat losses at the furnace ends. Independent, single-loop, 
off-the-shelf PID controllers suffice for these purposes. The trend to larger wafer 
diameters, the demanding uniformity requirements for ULSI applications, and the 
demands for reduced thermal budget all led to an increased use of single-wafer process 

20 reactors. For commercially feasible throughput, it is highly desirable to minimize the 
process cycle time by heating substantially only the wafer and its immediate 
environment. In many cases, single-wafer reactors are of the cold-wall or warm-wall 
type, in which quartz or stainless steel process chambers are water or air cooled. Under 
such circumstances, the system is no longer isothermal and temperature uniformity 

25 control becomes an issue of considerable concern and technical difficulty. A recent 
technical review of the field is provided in "Rapid Thermal Processing Systems: A 
Review with Emphasis on Temperature Control," F. Roozeboom, N. Parekh, J. Voc. 
Sci. IggfanoL B 8(6), 1249-1259, 1990. 
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Specific physical process characteristics serve to exemplify the need for precise 
temperature uniformity. Homo-epitaxial deposition of silicon should be performed in a 
manner which minimizes crystalline growth defects, such as lattice slip. Such defects are 
induced by thermal gradients in the wafer during high temperature processing, 
5 becoming more sensitive to gradients as temperature increases. For example, while 
gradients of about 100°C across an 8-inch wafer may be tolerable at a process 
temperature of 900°C, respective gradients of only 2-3°C are allowable at process 
temperatures of 1 100°C. There is some experimental evidence to indicate that gradients 
of approximately 10°C may be tolerable for a few seconds. The deposition of 

10 polycrystalline silicon (polysilicon) typically takes place at 600-700°C where as a rule 
of thumb a 2% uniformity degradation is incurred for every degree of temperature 
gradient. Moreover, in heterodeposition processes such as polysilicon deposition, 
multiple reflections and optical interference within the deposited overlayers can give rise 
to emissive or absorptive changes with overlayer thickness, exacerbating the problem of 

15 maintaining temperature uniformity (J.C. Liao, T.I. Kamins, "Power Absorption During 
Polysilicon Deposition in a Lamp-Heated CVD Reactor, J. Appld. Phys., 67(8), 3848- 
3852 (1990)). Furthermore, patterned layers can also lead to variations in light 
absorption across the wafer, creating local temperature gradients. (P. Vandenabeele, K. 
Maex, "Temperature Non-Uniformities During Rapid Thermal Processing of Patterned 

20 Wafers," Rapid Thermal Processing , SPIE, Vol. 1 189, pp. 84-103, 1989). 

The aforementioned factors complicating the control system design are not only 
manifest for rapid thermal chemical vapor deposition (RTCVD) systems, but apply to 
thermal processing (TP) systems in general, where the need for precise process control 
is balanced by the demand for minimal process cycle times. The generally short process 

25 cycle times and fast dynamics of the single- wafer systems render dynamic control of 
temperature uniformity a necessity of considerable technical difficulty. The radiant 
heating systems used for rapid wafer heating comprise either arc lamps or banks of 
linear tungsten-halogen lamps divided into several independently-controllable heating 
zones. The wafer itself, in principle, represents a complex thermal system whose 

30 interaction with the radiant energy is inherently nonlinear. Furthermore, since the 
requirements for power distribution over the wafer are different for dynamic compared 
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to steady-state uniformity, it does not suffice to deduce the required power settings from 
a wafer temperature measurement at a single point. In general, multiple sensors are 
required to measure and maintain a uniform temperature distribution over the wafer. 
These considerations render temperature control an essentially multi-input, multi-output 
5 (MIMO) or multivariate problem. Due to the large interaction between zones 
inherently present in radially heated systems, the conventional control techniques, for 
example, using single-loop, coupled or master-slave type PID control, cannot be 
expected to provide thermal process reactor systems with the required control 
specifications for all operating conditions. Conventional PID control techniques are 
10 susceptible to lag, overshoot and instability at the desirable process rates, and therefore 
become limiting factors in single-wafer process reactors. Thus, there is a clear need in 
electronic materials processing for systems which can maintain precise, dynamic 
multivariant control while providing commercially viable wafer throughput. 

The foregoing discussion has clearly outlined the need for effective uniformity 

15 control in thermal process reactors using a multivariate approach. This view is 
endorsed by many authors. See, for instance, several contributions in the Rapid Thermal 
and Inteerated Processing Symposium , ed. J.C. Gelpey, et al., Mater. Res. Soc. Svmp. 
Proc . Vol. 224, 1991 . In particular, articles by Moslehi et al. (pp. 143-156), Apte, et al. 
(pp. 209-214), and Norman et al. (pp. 177-183), discuss various aspects of multi variable 

20 temperature control. Several attempts to develop models for RTP and RTCVD systems 
are reported in the literature. Two examples, Norman and Gyurcsik, et al., developed 
different models, both using a first-principles approach, and applied the models to 
uniformity optimization (S.A. Norman, "Optimization of Wafer Temperature 
Uniformity in Rapid Thermal Processing Systems," ISL Tech. Rep. No. 91-SAN-l, 

25 Subm. to IEEE Trous. on Electron Devices, 1991; R.S. Gyurcsik, TJ. Riley, R.Y. 
Sorrel, "A Model for Rapid Thermal Processing: Achieving Uniformity Through Lamp 
Control," IEEE Trans, on Semicon. Manf., Vol. 4(1), 1991). The model of Norman 
(1991) consists of two components. The first component models the (two-dimensional) 
heat balance of the wafer and is used to compute the steady-state wafer temperature 

30 profile for a given heat flux from the lamps. The second component models the heat 
flux from the lamps as a function of the individual lamp powers. A least-squares 
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method is used to fit a quadratic relationship between the desired temperature at discrete 
radial positions on the wafer and the flux density due to the lamps. Next, the lamp 
model is used to determine optimal relative power settings for the lamps that 
approximate the required flux. This method only applies to the uniformity control in 
5 steady-state, i.e., constant input. However, Norman, et al. (1991), consider not only the 
steady-state optimization problem, but also the problem of designing an optimal 
trajectory. For this purpose the dynamic model is a finite-difference approximation to 
the one-dimensional heat equation, including the effects of conduction in the wafer, 
convective heat loss from the wafer, and radiative transfer. A minimax solution is 
1 0 chosen for the steady-state uniformity optimization and trajectory following. 

Dynamic system modeling is an essential ingredient of predictive control laws, 
which provide the fundamental structure for a unique class of contemporary control 
algorithms. In essence, system or plant control strategies are based on predicted future 
plant behavior predicated on a suitably accurate dynamic plant model. The future 
15 control strategies are not static and do not extend arbitrarily to future time slots; but 
rather are periodically updated in accordance with the plant model in a so-called 
receding horizon fashion. For a number of years, predictive control has been the subject 
of extensive research and development. Indeed, predictive control is the central theme 
behind the benchmark works of Cutler and Ramaker in their Dynamic Matrix Control 
20 (DMC) algorithm (C Cutler, B.L. Ramaker, "Dynamic Matrix Control — A Computer 
Control Algorithm," Joint Automatic Controls Conference Proceedings , San Francisco, 
1980) and Richalet, et al., in their Model Algorithmic Control (MAC) algorithm (J.A. 
Richalet, A. Rault, J.D. Testud, J. Papon, "Model Predictive Heuristic Control: 
Application to Industrial Processes," Automatics Vol. 14, No. 413, 1978). Further 
25 predictive and adaptive characteristics are incorporated by R.M.C. de Keyser, et al., 
"Self-Tuning Predictive Control," Journal A , Vol. 22, No. 4, pp. 167-174, 1981; and 
more recently by Clarke, et al., in their Generalized Predictive Control (GPC) algorithm 
(D.W. Clarke, C. Mohtadi, P.S. Tuffs, "Generalized Predictive Control. Part I: The 
Basic Algorithm;' Automatics Vol. 23, No. 2, pp. 137-148, 1987). Much of the 
30 contemporary control work in the literature is to some extent based on these approaches. 
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In DMC and other similar approaches, plant models are identified and cast in the 
form of deterministic impulse-response or step-response models. While these model 
forms are well-understood, they are often computationally cumbersome and present 
significant compromises between accuracy and response for long-range model 
5 predictions. Further, DMC appears to be incapable of handling non-minimum phase 
and open-loop unstable plants. A significant redeeming feature of DMC is that of the 
receding horizon, after which control increments are assumed to be zero. This 
advantageous assumption is incorporated in GPC, which in various derivations also 
utilizes extensions of Auto-Regressive Moving Average (ARMA) plant models such as 

10 CARMA or CARIMA (Controlled Auto-Regressive Moving Average, CAR-Integrated- 
MA). The ARMA plant models are generally represented by expressions involving 
polynomials A, B and C of the time-shift operator q* 1 . The shift operator q" 1 acts on a 
function of a discrete time variable f(t), such that q _, f(t) = f(t-l) and in general q'Tft) = 
f(t-u). The model polynomials A, B and C act on process inputs u(t), process outputs 

1 5 y(t) and process disturbances e(t) such that: 

ACq-'MO-BCq-'MO + CCq-'Wt) 
Such models represent both the plant dynamics via the polynomials A,B and the 
disturbance via A,C. A particular advantage is that the number of parameters in the 
model is minimal so that they can be estimated with high efficiency. As outlined by 

20 Clarke, et al., the long-range plant predictions are best accomplished by recursion of an 
associated Diophantine equation involving the model parameters. A similar ARMA 
model and recursive model prediction is also found in US Patent No. 5,301,101 by 
MacArthur, et al., which discloses an adaptive receding horizon-based controller 
incorporating means for operating cost minimization. 

25 Nevertheless, in spite of the recent effort to develop new, useful multivariant 

control techniques, until now there has been little success in applying them to the 
demanding conditions imposed by commercial thermal process reactors. The only 
apparent successes to date has involved the use of physical models rather than the black 
box models employed herein (see e.g. Cole Porter et. al., "Improving Furnaces with 

30 Model-Based Temperature Control**, Solid State Technology , November 1996, page 
119). 
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Summary of the Invention 
It is an object of the present invention to provide a method and apparatus for a 
more effective temperature control system in multivariant thermal processes. 

In accordance with one aspect of the present invention, a temperature controlled 
5 thermal process reactor comprises a chamber within which a thermal process is 
executed, a source of thermal energy, a thermal sensor, and a model-based predictive 
temperature controller. One preferred embodiment of the temperature controlled 
thermal process (TP) reactor comprises a multivariate temperature controlling 
arrangement. The temperature controller preferably comprises a multivariable thermal 

10 process model that relates multivariable process input thermal energy to multivariable 
process output temperature. The temperature controller also preferably comprises a 
prediction calculator that uses the process model to calculate a predicted temperature 
output over a predetermined future time period or prediction horizon. The preferred 
temperature controller additionally comprises a control calculator that uses the predicted 

15 temperature output to calculate an optimum control strategy by which to control the 
source of thermal energy. The control calculator preferably calculates an optimum 
future control strategy by comparing the predicted process output variables to a set of 
desired future process output variables. 

In accordance with another aspect of the present invention, a temperature control 

20 system for controlling a thermal process comprises a controllable source of thermal 
energy, a temperature sensor, and a model-based predictive temperature controller. The 
model-based predictive temperature controller comprises a thermal process model that 
relates process input thermal energy to process output temperature and a prediction 
calculator that uses the thermal process model to calculate a predicted nominal 

25 temperature output over a predetermined future time period. The temperature controller 
further comprises a control calculator that uses the predicted nominal temperature output 
to calculate an optimum strategy by which to control the source of thermal energy. 
Preferably, the control calculator compares the predicted temperature output to the 
desired temperature output to derive the optimum control strategy. In a preferred 

30 temperature control system, the prediction calculator periodically updates the 
predictions in accordance with an auto-regressive moving average calculator. In a 
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preferred arrangement, predictions are executed over a predetermined future time period, 
which is updated in accordance with the auto-regressive moving average. 

In still another aspect of the present invention, a method of controlling a thermal 
process comprises the steps of measuring a process output temperature and using this 
5 information in predicting a future process output temperature. The method further 
comprises calculating an optimum process input control strategy and controlling a 
process input thermal energy using the calculated optimum process input control 
strategy. In a preferred embodiment of the method, predicting a future process output 
temperature comprises identifying a thermal process model relating process input 
10 thermal energy to process output temperature. The preferred method of prediction 
further comprises recursive application of the thermal process model over a 
predetermined future time period, or prediction horizon. The predictions are 
furthermore periodically updated in accordance with an auto-regressive moving average 
calculator. Another preferred method of rapid thermal process control comprises 
15 calculating an optimum process input control strategy by comparing the predicted future 
process output temperature to a desired future process output temperature. 

In accordance with another aspect of the present invention, a temperature control 
system for controlling a thermal process comprises a controllable source of thermal 
energy, a temperature sensor, and a nonlinear, model-based predictive temperature 
20 controller. The model-based predictive temperature controller comprises a nonlinear 
thermal process model that relates process input thermal energy to process output 
temperature and a prediction calculator that uses the thermal process model to calculate 
a predicted nominal temperature output over a predetermined future time period. The 
nonlinear model further comprises a neural network. In a particularly preferred 
25 embodiment, the neural network emprises hidden neurons that are of the sigmoid type. 

In accordance with yet another aspect of the present invention, a temperature 
control system for controlling a thermal process comprises a controllable source of 
thermal energy, a temperature sensor, a model-based predictive temperature controller, 
and a softsensor model that relates susceptor temperatures to wafer temperatures. The 
30 softsensor model provides an estimate of the immeasurable wafer surface temperatures. 
In a preferred embodiment, the softsensor model is an FIR model. The model 
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coefficients for the softsensor FDR filter are obtained through the use of an instrumented 
wafer. 

In accordance with yet another aspect of the present invention, a temperature 
control system for controlling a thermal process comprises a controllable source of 
thermal energy, a temperature sensor, a model-based predictive temperature controller, a 
softsensor model that relates susceptor temperatures to wafer temperatures and a 
setpoint generator that uses the output of the softsensor model, and the recipe to adjust 
the setpoints so that the wafer surface temperatures will be closer to the values specified 
in the recipe. 



The model-based predictive temperature controller comprises a nonlinear thermal 
process model that relates process input thermal energy to process output temperature 
and a prediction calculator that uses the thermal process model to calculate a predicted 
nominal temperature output over a predetermined future time period. The nonlinear 
1 5 model further comprises a neural network. In a particularly preferred embodiment, the 
neural network emprises hidden neurons that are of the sigmoid type. 

Brief Description of the Figures 
Figure 1 is a schematic perspective view of a single-wafer rapid thermal 
20 chemical vapor deposition reactor. 

Figure 2 is a schematic diagram of a prior art temperature control system used in 
single-wafer reactors. 

Figure 3 shows representative data characterizing the tracking and response of a 
prior art multivariate temperature control system. 
25 Figure 4 is a basic block diagram of a model-based multivariate temperature 

control system. 

Figure 5 is a block diagram of a multivariate model-based predictive 
temperature control system. 

Figure 6 is a flow chart representing a preferred predictor and controller 

30 algorithm. 



8 



WO 97/28669 PCT/US97/01318 

Figure 7 is a system diagram of a preferred multivariable model-based predictive 
temperature control system. 

Figures 8A and 8B illustrate an exemplary input/output identification data set for 
the center zone, showing system stimuli (B) and response (A). 
5 Figure 9 illustrates an exemplary system output simulation using system input 

data for the center zone. 

Figure 10 illustrates an exemplary residual correlation for the system center zone 
input/output data set. 

Figure 1 1 illustrates an exemplary model prediction data set compared to system 
10 output data. 

Figure 12A illustrates an exemplary command sequence and output response for 
each reactor zone. 

Figure 12B illustrates an exemplary input response to the command sequence of 
Figure 12 A. 

15 Figure 13A illustrates exemplary data characterizing the tracking and response to 

each system output variable. 

Figure 13B illustrates exemplary data characterizing the tracking and response of 
each system input variable to the command sequence of Figure 13 A. 

Figure 14A is a block diagram that illustrates an overview of a fabrication 

20 system. 

Figure 14B is a block diagram that illustrates, in greater detail than Figure 14A^ 
the various hardware, software, and conceptual components of a fabrication system 
comprising a nonlinear, neural network based controller. 

Figure 15 illustrates a block diagram of the nonlinear process model. 
25 Figure 16 illustrates a typical typical neural network. 

Figure 17A is a block diagram of the parallel model network. 

Figure 17B is a block diagram of the series-parallel model network. 

Figure 18 is a flowchart that illustrates the process for computing a new set of 
predictions for u(t+k\t), and y(t+k\t) at each time-step /. 

30 Figure 19 illustrates a simple neural network having one hidden neuron. 
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Figure 20 illustrates the waveforms in the single input, single output (SISO) 
controller. 

Figure 21 is a flowchart illustrating the steps necessary to compute the step 
responses in the MEMO predictor. 
5 Figure 22 illustrates the sigmoid function used in the neural network of figure 

16. 

Figure 23 (comprising Figures 23A and 23B) is a flowchart illustrating the 
pseudo least squares (PLS) procedure. 

Figure 24 is a block diagram that illustrates an extension of the basic 
10 fabrication system to a softsensor fabrication system. 

Detailed Description of the Preferred Embodiments 
OVERVIEW OF RTP PROCESS CONTROL 

The model-based predictive control system of the present invention is herein 
illustrated in the context of rapid thermal processing (RTP) systems, and in particular a 
15 rapid thermal chemical vapor deposition (RTCVD) system, which itself makes 
advantageous use of the superior degree of temperature uniformity provided by the 
present invention. In the description and drawings, the apparatus is shown in generally 
schematic fashion, and only those portions necessary to illustrate the inventive concepts 
disclosed herein have been included. In particular, it is to be understood that the 
20 apparatus is intended to be enclosed within and supported by a surrounding enclosure 
(not shown) in and on which necessary gaseous reactant flow controls, process controls, 
instrumentation, and other attendant mechanisms are intended to be housed and 
mounted. 

The RTCVD system 30 illustrated in Figure 1 comprises a reaction chamber 30 
25 of the horizontal flow type formed of a material transparent to radiant heat energy, such 
as fused quartz. The reaction chamber 30 may comprises a tubular shaft having a cross- 
section defining a reactant gas flow passage 28. The substrate or wafer 22 may be 
supported in the center of reaction chamber 30 by a circular, slab-like susceptor 24 held 
in place by a rotatable driveshaft assembly 26 extending out of the reaction chamber 30. 
30 The susceptor 24 is generally fabricated from a material which is opaque to the radiant 
heat energy supplied from the radiant heat source, and is preferably thermally 
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conductive. For example, the susceptor 24 may be fabricated from a material such as 
graphite. A plurality of thermocouples 44, 46, 48, 50 are imbedded in the susceptor 24 
for determining the local substrate temperature at predetermined positions on the 
substrate 22, shown here at respective wafer locations center 44, front 46, side 48, and 
5 rear 50. The thermocouple signals are supplied to the temperature controller discussed 
below. 

The radiant heating systems used for rapid wafer heating in general comprise 
either arc lamps or banks of elongated tungsten-halogen lamps divided into several 
independently-controllable heating zones. The radiant heat source shown in Figure 1 
10 comprises two banks of high-power elongated tungsten-halogen lamps located above 
and below the reaction chamber 30. The upper bank of lamps is oriented parallel to the 
process gas flow 28 and the plurality of upper bank lamps are divided into portions 
comprising a center zone 34 and two side zones 40, corresponding to their relative 
proximity with respect to the wafer 22 and gas flow 28. Analogously, the lower bank of 
15 lamps is oriented orthogonal to the process gas flow 28, and the plurality of lower bank 
lamps are divided into portions comprising a center zone 32, a front zone 38 and a rear 
zone 36, corresponding to their relative proximity with respect to the wafer 22 and gas 
flow 28. The electrical power supplied to the lamps by lamp drivers (discussed below) 
is typically controlled by a plurality of SCR power packs (discussed below) configured 
20 to control the duty cycle or phase angle over which the electrical power is supplied to 
combinations of lamps affecting specific heating zones. The SCR firing phase angle is 
preferably adjusted to render a linearized power input to the lamps as done, for example, 
in so-called V 2 or V*I modes of operation. 

In operation, the substrate 22 is placed into the reaction chamber 30 and onto the 
25 susceptor 24 at the beginning of a process cycle. A reactant gas flows through the 
reaction chamber 30 in the direction indicated by the gas flow arrow 28 to deposit 
materials on the substrate 22. During a process cycle, a desired sequence of thermal 
process steps proceeds in concert with the reactive gas processing. The thermal 
processing sequence is performed by adjusting the power level of the lamps to achieve a 
\0 desired wafer temperature at a specific time in the process cycle. The radiant heat 
energy supplied to various heating zones is controlled on the basis of temperature 

11 
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measurements within the respective heating zones, which information is supplied to the 
temperature control system discussed below. The substrate 22 is removed from the 
reaction chamber 30 upon completion of the process cycle. 

As discussed earlier, the cold-wall and warm-wall reaction chambers such as that 
5 shown in Figure 1 are inherently non-isothermal. Thus, achieving a uniform 
temperature distribution is complicated by non-uniform heat flow, wafer geometry and 
attendant optical properties. The position, orientation and power level of lamps shown 
in Figure 1 are in principle configured to provide a uniform temperature distribution 
over the wafer 22 by supplying an appropriate spatial and temporal distribution of heat 
10 energy. The plurality of lamps comprising different zones, for example, the side zones 
40, as well as those of front and back zones 38 and 36, are supplied with varying 
electrical power levels comprising the multivariate control inputs. These control inputs 
produce varying radiant power levels in different heating zones to affect the temperature 
distribution over the substrate 22 during wafer processing. The various lamp operating 
15 powers are adjusted by a temperature controller operating on the basis of real-time 
temperature feedback provided by thermocouples 44, 46, 48 and 50 comprising the 
multivariate control output. The action of the temperature control system preferably 
compensates the aforementioned non-uniform thermal characteristics of the wafer 22 
and the reactor 20 to affect a uniform wafer temperature distribution. 
20 As shown in Figure 2, an exemplary prior art multivariable temperature control 

system for an RTCVD reactor may comprise a plurality of Proportional-Integral- 
Differential (PID) controllers well-known in the art, and configured in a so-called 
master-slave arrangement. A top view of the wafer 22 shows the relative positions of 
the lamp heating zones 32, 34, 36, 38, 40 and 42 and the sensing thermocouples 44, 46, 
25 48 and 50 with respect to the wafer 22 and the gas now vector 28, as previously 
described. The temperature sensors 44, 46,48 and 50 are connected to supply respective 
PID controllers 64, 66, 68, and 70 with signals indicative of the local wafer 22 
temperature. The PID controllers 64, 66, 68 and 70 are also connected to sources of 
reference signals, which supply each PID controller with a respective temperature 
30 reference signal or set-point. In the so-called master-slave arrangement shown here, a 
process controller 62 is connected to supply the center PID controller 64 with the global 
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or master set-point information, while the PID controllers 66, 68 and 70 are connected 
and referenced to the center temperature sensor 44 of the wafer 22. The output signals 
of the PID controllers 64, 66, 68 and 70 are in turn connected to respective sets of 
Silicon Controlled Rectifier (SCR) power packs 84, 86, 88 and 80, which control the 
5 lamp electrical power for respective heating zones 32/34, 36, 40/42 and 38. 

In general, the PED controllers shown in Figure 2 operate to minimize the error 
signals which are the differences between the respective reference temperatures and the 
respective measured temperatures by a negative feedback adjustment of the respective 
lamp powers. The feedback signal produced by a particular PID controller is 

10 determined by the response characteristics of the controller and reactor, and, as such, 
generally represent a considerable challenge to optimize. Several measures may be 
employed to characterize the dynamic system response, such as speed of response, 
accuracy, relative stability and sensitivity. For example, such a controller will provide a 
feedback signal consisting of three terms, a first term proportional to the error signal, a 

15 second term proportional to the time-integral of the error signal and a third term 
proportional to the time-derivative of the error signal. All three proportionality 
constants require adjustment. Under static or steady state conditions, it would be 
expected that the center PED controller 64 maintain the center wafer temperature at a 
predetermined reference value, and the slave PED controllers 66, 68, 70 maintain the 

20 peripheral zones at the center zone temperature. As shown in Figure 3, the curve 90 
depicts a step in the set-point wafer temperature, and the curve 92 represents the time 
response of the center zone 44 to that step, indicating a stable steady-state center zone 
temperature after a sufficiently long settling time period. A peripheral zone time 
response is represented by the curve 94, which also displays stable steady-state behavior 

25 at long times. However, even an optimally adjusted PID controller system is limited by 
inherent time delays, characteristic response times and overshoot, as indicated by the 
transient time response of the curve 92. Moreover, since the heating zones are strongly 
coupled, a change in one zone will influence the transient control of other zones, at least 
temporarily inducing temperature gradients as shown by the curve 96. Coupled PID 

30 systems, such as shown in Figure 2, exacerbate the response challenge and are 
commonly detuned to avoid instability at a sacrifice to wafer throughput. 

13 



WO 97/28669 



PCTAJS97/01318 



OVERVIEW OF MODEL-BASED PREDICTIVE CONTROLLER 

As shown in the basic block diagram of Figure 4, a thermal process reactor 
incorporating a preferred embodiment of the model-based predictive control system of 
the present invention utilizes heat zone temperature sensors 44, 46, 48, 50 as the 
5 multivariate control inputs. The temperature sensors provide a model-based predictive 
controller 100 with information about the state of the system, namely the zone 
temperatures of the substrate 22. Based on this information the model-based predictive 
controller 100 computes an optimum sequence of future control strategy comprising 
electrical power inputs to the separate heat zone lamps 32, 34, 36, 38, and 40. The 

10 process controller 62 is connected to the model based predictive control system 100 and 
provides it with the desired process temperature sequence. 

The multivariate control techniques disclosed herein exhibit improved control 
performance in comparison to conventional PID-type controllers because they contain 
more information about the system dynamics. This information is utilized in an Auto- 

1 5 Regressive Moving Average (ARMA) model, hence the name model-based predictive 
control. Feedforward or predictive compensation up to a predetermined receding 
prediction horizon provides improved control performance since it allows the controller 
to react before a measurable disturbance has effected the system. The sequence of 
control predictions is established in a recursive fashion vis a vis the ARMA model, thus 

20 increasing controller response time and flexibility. 

One embodiment of the control system of the present invention is described with 
reference to the block diagram of Figure 5, which shows that the temperature controller 
100 (Figure 4) comprises several interacting components. The overall block diagram of 
the dynamic system (e.g., the controller, the reactor, the lamps and the sensors) 

25 comprises both the controller 100 and the plant or reactor 20 for which the controller is 
responsible. The reactor 20 may be exposed to uncontrolled disturbances 104 which 
influence the reactor state response through disturbance signal input e(t) 124. The 
disturbance signal 124 may affect the state of the reactor 20, as measured by the 
plurality of process control inputs y(t) 116 (or process outputs), in this case comprising 

30 an array of the measurements made by temperature sensors 44, 46, 48, 50 at the discrete 
time variable t. The control input 116 is provided to the temperature controller 100 
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through the predictor 108. The temperature controller comprises principally interacting 
components: the predictor 108, the model 1 10, a controller or control law processor 112, 
and is supplied with a command sequence W(t) 122 from a process controller 106 in 
accordance with the predefined sequence of desired process temperatures. The predictor 
5 108 computes a sequence of future reactor states y(t+k|t) (120), where k is a discrete 
time index referenced to time t. As defined herein, a predicted functional value f(t+k) 
made at time t is denoted by f(t+k 1 1). The predictions y(t+k 1 1) are made through any 
formulation based on the model 126, coupled with the control input 116 and control 
strategy u(t) 118. The predictor output 120 extends forward in time from t to t+N, 

10 where N is the prediction horizon. The predictions y(t+k 1 1) are reciprocally supplied as 
input to the control law processor 112. The control law processor 112 computes an 
optimal control strategy u(t) 118 based on a predetermined control criterion (discussed 
later), the supplied predictor output 120 and the supplied command sequence W(t) 122. 
The optimal control strategy 118 is supplied as a process input to a lamp driver 102 

15 which converts the control signals 118 to electrical power input signals P(t) 1 14. The 
lamp input signals 114 are supplied to the reactor lamps, thereby affecting the radiant 
heat distribution within the reactor 20. 

MODEL-B ASED PREDICTIVE CONTROL ALGORITHM 

The following detailed description provides a functional explanation of the 
20 algorithm used in the model-based predictive controller. A brief derivation of the 
algorithm serves to exemplify the application to temperature control in general, as well 
as to the preferred embodiments of RTP temperature control. For clarity, the derivation 
begins with a single-input, single-output (SISO) process model, subsequently 
generalized to the multi-input, multi-output (MIMO) case. 
25 The SISO Process Model 

In this section the general formulation for the linear single-input, single-output 
(SISO) polynomial model will be described. 

A preferred SISO polynomial model has the following general form: 

, B(q A ) Crg' 1 ) 
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where y(t) is the control input, u(t) is the process input, e(t) is a zero-mean gaussian 
white noise sequence, t is the discrete time index (t=...-2, -1, 0, I, 2,...), q l is the 
backward-shift operator q" l y(t) = y(t-l), and A(q l ), BCq' 1 ), C(q D(q"'), and FCq" 1 ) are 
the polynomials 

A(q'*)=l + a x q' X +...a a q a 

B(q A ) = biq A +...bpq 0 
C(q ] ) = l + ctq l +...c z q* 

D(q l ) = l + d { q l +...dsq S 
F(q x ) = l + f x q'+...f^q' n 

5 Here the polynomials C(q { ) and F(q l ) are asymptotically stable polynomials 

with all their zeros strictly inside the unit circle, and D(q" ! ) is a stable polynomial with 
its zeros inside or on the unit circle. The A(q" ! ) polynomial may contain unstable 
process poles, and the BCq" 1 ) polynomial may contain nonminimum-phase zeros. The 
C(q l ) and D(q') polynomials are herein defined as design polynomials. An 

10 advantageous feature of the present preferred model formulation is the definition and 
inclusion of polynomials D(q**) and F(q Their influence in the model behavior more 
effectively decouples any correlation between the noise input e(t) and process input u(t). 
It is believed that such decoupling more accurately reflects the true behavior of a 
thermal process reactor. 

15 The SISO Multistep Predictor 

To facilitate the model predictions, the filtered signals y^t) and u/t) are defined 

as 
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A(q')D(q') 

Cf<7 > (2) 

/„ B(q')D(q') 

Uf(t> Frf )€<?)«»■ (3) 



Consequently, Equation (1) can be rewritten as 

y f (0 = u f (0 + e(t). 



(4) 



Hence, another advantageous feature of the present preferred model formulation is the 
definition and use of the filtered signals y^t) and u/t). As disclosed herein, the filtered 
signals y^t) and u/t) provide convenient closed-form solutions for the predicted 
5 response y(t+k|t). As previously defined, y(t+k|t) denotes the predicted value of 
y(t+k) based on measurements available at time t, i.e., {y(t), y(t-l), u(t-l), u(t-2), ...} 
and (postulated) future values of the process input { u(t 1 1), u(t+l 1 t), ...u(t+k 1 1) }. From 
the expression for the filtered output at time t+k, namely 

y f (t + k) = u f (t + k) + e(t + k) 

it follows that the optimal k-step-ahead predictor is simply given as 

y f (t + k\t) = u f (t + k\ 0, fork>0 

10 where e(t) is assumed to be pure white noise. For k < 0 the predictor is given by 

y f (t + k\t) = y f (t + k),fork<0 

(o) 

In terms of the unfiltered process output, Equations (5) and (6) can be written as 

A(q l )D(q l )y(t + k\t) = Cy f (t + kit), for k >0 
y(t + k\t) = y(t + for k<0 

and 
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Equation (8) plays an essential role in the proper initialization of the difference equation 
(7). The filter y/H-k 1 1) is re-initialized at each step t and gives consecutively all values 
in the whole prediction range (y(t+k 1 1)} for k=l ...N, where N is the prediction horizon. 
The structure of the predictor algorithm is substantially as that shown in the 
5 dashed block 148 of the flow chart shown in Figure 6. The process control begins with 
an initialization block 127 followed by a computation of the forced response gain vector 
K 129 (to be discussed below in connection with the control law). At each time step t, 
the process input y(t) and output u(t) vectors, as well as the filtered vectors y^t) and 
u^t), are shifted in the time index as indicated by the shift block 128, in accordance with 
1 0 the receding horizon formulation. The following process steps exemplify the predictor 
structure: 

(1) Measure y(t) at a process block 130 and store the data in a database { y(t),y(t-l), 

u(t-l), u(t-2), ...}, as indicated by a process block 132; 

(2) Postulate the future control policy { u(t 1 1), u(t+l 1 1), ...u(t+N It)} in a process 
15 block 134. 

The simplest assumption to make about the future process inputs is that 
they will remain constant. Thus, u(t-l) = u(t I 1) = u(t+l 1 1) = ... = u(t+N 1 1). 

As elaborated in the next section C, the assumptions made here lead to a 
computation of the free response of the system, which is subsequently compared 
20 to the desired response in order to deduce an optimal control strategy. 

(3) Compute the vector of filtered inputs { u^ttt), u/t+1 I t^.u/t+N 1 1)} in a 
process block 136 in accordance with Equation (3) using: 

u f (t) s Uf (t\t) = -fc lUf (t - 1) -fc 2 u f (t - 2>... 
+bd 0 u(t\ t) + bdj u(t -1)+... 

u f (t + l\t) = -fc,u f (t\t)^fc 2 u f (t - />... 
+bd 0 u(t + + bd,u(t\l)+... 



u f (t + N\t)= -fc t u f (l + N\t)-fc 2 u f (t + N-2\t)-... 
+bd 0 u(t + N]() + bdt u(t + N - /|/)+... 
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where 

B(q')D(q') = bd 0 + bd,q'' ' + bd 2 q l '+... 

and 

F(q')C(q') = fc 0 + fc l q' + fc } q : +... 

and where b d0 =0 (since b 0 = 0) and f rt =I (since f^=l, and c 0 =l ). 
Store the result in a data base {u/t)} in a process block 138; 
(4) Compute y^t) in a process block 140 in accordance with Equation (2) using 

y f (t) - -cy/r - l)-c 2 y f (t-2)-... 
+ y(t) + ad, y(t -!)+.... 

where 

a(l' )D(q' ) = ado + ad,q' + ad,q : +..., 

and a^ =1 (since a^, =1, and do =1); 
Store the result in a database {y<(i)}, as indicated by a process block 142; 
Set the filtered process output y,(t+k|t) equal to the filtered process input 
u/t+N 1 1) in a process block 144 in accordance with Equation (5): 

y f (t + i\t) = u f (t + i\t), 

y f (t + N\t) = Uf (t + N\t). 

(6) Compute the predictions {y(t+I 1 1), y(t+2 1 1),..., y(t+N 1 1) } in process block 1 46 
from Equations (7) and (8) using: 



(5) 

10 
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y(t + / 1 1) = -ad / y(t) - ad 2 y(t 

+ y r fl + i\0+Ciy f (0+.... 

y(t + 2|r; = -flrf!^/ + /|/>l -ad 2 y(0-... 
+ y / (t + 2\t) + 'c,y / (t + l\t)+.... 

y(t + N\t) = -ad iy (t + N-l\t)-ad 2 y(t + N-2\t)-... 
+ y r (/ + A^/;+ cy^ 7 (t+N-l\t) +..., 

Note that only u^t) and y<(t) have to be saved for the next time step (t+1). All 
other predicted data, indicated with (t+k 1 t), can be forgotten after time t. The set of 
predictions y(t+k 1 1) is supplied to the predictive controller, described in the following 
section. 

5 The SISO Predictive Controller 

The predictive controller of the present invention determines the control strategy 
u(t) which minimizes the cost fionction H, defined as: 

H = j^[w(t + k\t)-y(t + k\t)} 3 + Auft + k\t)] 2 , (9) 

k-o *-0 

subject to 

Au(t + k\t) = Ofork>N u , (10) 

where w(t) is the actual set point, N is the prediction horizon, and N u is the control 
10 horizon, Au(t)=u(t)-u(t-i), and Au(t+k I t)=u(t+k I t)-u(t+k-l It). The cost function H 
comprises terms quadratic in [w(t+k)-y(t+k)J and [u(t+k)-u(t+k-l)]. The set of terms 
involving the control input y(t) reflects the predicted controller tracking error, which is 
desirably minimized with respect to future control moves u(t+k|t). The set of terms 
involving the control strategy u(t) reflects the effort to achieve a given level of tracking 
15 error. The prefactor X is preferably timed to provide the desired level of controller 
response. In a presently disclosed exemplary embodiment, X=0. 
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Since the model of the system is linear, the future response y(t+k 1 1) can be 
considered as a superposition of two separate contributions: 

y(t +k\t) = y 0 (t + k\t) + y p (t + k\t). 

Here, the free response, y 0 (t+k 1 t), is the result of past process inputs { u(t-l), u(t-2), } 
assuming that all future control moves are zero (i.e., Au(t 1 1) = Au(t+1 [ t)= ... = 0, or 
5 equivalently, u(t i t) = u(t-l), u(t+l 1 1) = u(t), ...), and of the disturbances acting on the 
system. The free response is computed with the procedure given in the previous section, 
using the prediction horizon N, and u(t 1 1) = u(t+l 1 1) = ... = u(t+N 1 1) = u(t-l). 

The forced response, y p (t+k 1 1) is the result of future control moves Au(t 1 1), 
Au(t 1 1), Au(t+1 1 1), Au(t+N u -1 1 1). It is the effect of a sequence of step inputs to the 
. 10 system: a step with amplitude Au(t 1 1) at time t, resulting in a contribution g k Au(t 1 1) to 
the predicted output at time (t+k), plus a step with amplitude Au(t+1 1 1) at time (t+k), 
etc. The total effect is thus 

y p (tk\0 = g k tsu(t\ t) + g k , / Au(t + ]\t)+... + g k _ Nmmi Au(t + N u -l\t) 



where G <** > = So* + Si*' + 

is the step response of the system B(q" , )/(A(q" l )F(q* 1 )). Since b 0 = 0, then go = 0. 
Moreover g k = 0 for k < 0. Using matrix notation and assuming N> N u results in the 
1 5 following expression for the vector of forced response contributions to the predictions: 
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~ y P (t+i\t) 




' s, 


0 0 


y p 0 + 2\o 






8, 0 


y p O + N\t)_ 




_8n 





&u(t\t) 
Au(t + l\t) 

_Au(t + N u -J\t)_ 



In matrix notation, the vector of predicted errors can be written as: 

w(t + ]\0-y 0 (t + J\t) 



w(t + N]t)-y(t + N]t) 



8, 0 



0 



8n 8n-i •■• 8^.n.*i 



Au(t\t) 



Au(t + N u -l\t) 



Or, equivalently, with obvious definitions for the newly introduced variables, as 

W-Y = W-Y 0 -GU 



(12) 



In the same fashion, and at the same time using Equation (12), the cost function (9) can 
be written as 

H = (W-Y) T (W-Y)+ZU T U = [(W-Yo)-GU ] r [(W-Y 0 )-GUJ + AU T U 
5 Minimizing H with respect to U gives the solution 

U' = (G T G+AJ ) 'G T OV-Yo) (13) 
Only the first element of U* is actually required to compute the control input: 
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u(t) — u(t - 1) + Au(t\t) 



At the next time slot (t+I), the whole procedure is repeated, taking into account the new 
measurement information y(t+l) and new set point data w(t+N+l), in accordance with 
the receding horizon principle. 

Denoting the first row of ^G+Xiy'G 1 by K, the control law is given by 

u(t) = u(t-l)+K(W-Y 0 ) (14) 

5 The gain vector K is computed in accordance with the foregoing matrix expression. 
Note that this gain vector has to be computed only once in the non-adaptive case, i.e., 
the case in which the model parameters remain fixed. This computation can be done in 
the initialization phase of the algorithm as previously mentioned and shown in process 
block 128 of Figure 6. Alternatively, the gain vector can be pre-computed off-line and 
10 stored in memory. Adaptive extension of the foregoing control law would, in essence, 
provide for periodic adjustment of the gain vector K. 

A dashed portion 166 of the flow chart in Figure 6 corresponds to the predictive 
controller and is supplied with the process output predictions y(t+k 1 1) 120 generated in 
the dashed portion 148. Because the postulated future control input u(t+k 1 1) is assumed 

15 constant and equal to u(t-l) (process block 134), then the predicted output y(t+k 1 1) is 
equivalent to the future free response of the system y 0 (t+k 1 1). In a process block 150, 
the system free response is set to the previously computed predictions y(t+k 1 1) (block 
146). The system free response is supplied to a process block 152, along with the 
current set point information from a block 154. At the process block 152, the optimum 

20 process control input U*(t) is computed using y 0 (t+k 1 1), W(t), u(t-l) and the gain vector 
K initially computed in block 128. The optimum control input LT(t) is used to adjust the 
lamp drivers at time = t in a process block 158. Additionally, the value of U*(t) is 
incorporated in the process input matrix {u(t)} in the block 156 which is subsequently 
supplied to the process block 134 in preparation for the next time-step operation. 

25 Following the lamp bank control adjustment in the block 158, a decision block 1 62 may 
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test to determine whether the process cycle is complete. If not, then a time-step 
increment is made in a block 160, which then shifts the set-point matrix in the block 
154, as well as process input/output matrix at the block 129. 

The MIMO Predictive Controller 
5 It will be appreciated that the formulation of the model-based predictive control 

algorithm for multi-input, multi-output (MIMO) systems is an extension of the SISO 
case. Those skilled in the art of control systems will know how to extend the previously 
described computational formalism to multivariant systems. 

The MIMO control systems modeled by the methods of the present invention are 
10 those characterized by a plurality of input u^t) variables and output yj(t) variables, where 
the variable indices i, j run up to the number of respective input and output variables m, 
n. Each output of the MIMO is related to all inputs via a dynamic relationship of the 
form(l): 

Aj^Jy/O-tf^u^t) + £^ e(t)9 forj = L..n. 

/«/ Fjifq ) Dj(q ) 

(15) 

1 5 Here, m denotes the number of inputs and n denotes the number of outputs. Both m and 
n are four in the case of the exemplary RTCVD system shown in Figure 1 . The MIMO 
muitistep predictor is conveniently considered as a consecutively applied predictor of a 
multi-input, single-output (MISO) model. Therefore, equations (15) can be considered 
as a set of coupled MISO models. Defining the filtered signals as 

Cj(q ) (17 ) 



20 and 



, . B M (q')Dj(q') ,. . , 

Fn(q )Cj(q ) ( 18 ) 



the filtered process output signal is written as: 
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(y f >) (0 = Z ("/ hi (0 + e (o. j=i-n 

(19) 

analogous to that shown in Equation (4). 

Thus, the k-step-ahead predictor for the j * process output is given by 

m 

(y / ) i (t + k\t) = ll(u / ) i (t + k\t),k>0 

'■' (20) 



(y f )j(t + k\t) = (y f )j(t + k),k<0 



5 Similarly, the MISO equivalent of Equations (7) and (8) is given by 

Aj(q-')Dj(q')yj(t + k\t) = Cj(y f )/t + k\t). for k >0 



y } (t + k\tj= Vj ft + k), fork<0 



The action produced by the MIMO predictive controller preferably minimizes the 
multivariant cost function analogous to Equations (9) and (10): 

/=' *-o /-/ k~o (24) 

1 0 with respect to AU^t+k 1 1) and subject to: 

&Ui(t + k\t) = 0 for k > M, i = I m (?5) 

Introducing the following notation for the step response coefficients related to input j 
and output j 
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(26) 



the forced response of output j due to postulated future variations of the control inputs 

{L Ui (t\t).Aui(t + l\t) Aui(t + N u \t). i = J m} 

can be written as: 



~y lp (t+i\t) 


m 

-Z 


V 0 0 

• • • 


0 

• • 


Auj(t\tf 








... 


_Auj(t + N u -l\t)_ 



with similar expressions for the other zones. The vector of predicted errors for the first 
process output in the time frame of interest can now be written as: 



wj(t + l\0-yj(t + l\t) 



Wj (t+ N\t)-y,(t + N\t)_ 



wj(t + l\t)-y l0 (t + l\tj 



w,(t + N\t)-y j0 (t+N]t) 



- z 



8f 



z J z J ' 

*/+/ 5/ 



0 0 .. 
0 .. 



Au,(t\t) 
Au,(t + l\t) 

Au,(t + N-l\t) 



5 or, equivalently, using matrix notation by analogy to Equation (12), 

■m 



(27) 
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with similar expressions for the other process outputs. Using the same notation, the cost 
function (24) can be written as: 



(28) 



The general solution to the minimization of Equation (28), subject to the criteria of 
Equation (27) and similar equations for the other process outputs, is found to be 



U' = 



U-i 



J u-> 



t J G T i (Wj-Yj.o) 



(29) 



with I the identity matrix of appropriate dimension, and 



G,=fc, G J2 . 

u = [uj uj .. 



Finally, the control output is calculated via 



(30) 



In practice, exemplary model parameters may for example comprise multi-input, 
multi-output (MIMO) 3rd order polynomial model coefficients defined by 

with 

10 Aj=I, 

D-l-q', 

C-d-C.q'Xl-C.q 1 ) 

for all j, and n = m = 4 for the exemplary embodiment described earlier. Empirical 
testing of a particular reactor will determine the most appropriate values for the 
1 5 coefficients as outlined below. 

RAPED THERMAL PROCESS PREDICTIVE CONTROL SYSTEM 
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The foregoing description of a preferred model and algorithm for a multivariable 
model-based predictive control system is general in nature. It can be applied to a variety 
of systems having input/output relationships characterized by a suitably accurate model 
implemented in an ARMA fashion. The long-range predictive nature of the model - 
5 based control algorithm provides fast response and robust behavior in addition to the 
flexibility afforded by the ARMA model. 

The following system description incorporates the foregoing algorithm, model 
and model implementation to provide static and dynamic temperature uniformity control 
in rapid thermal processing reactors. 
10 As shown in Figure 7, a multivariable temperature control system for a rapid 

thermal process reactor comprises a temperature sensor array disposed within the 
process reactor 20. The temperature sensors may comprise thermocouples or other such 
equivalents. In the present embodiments, thermocouples 180, 182, 184, and 186 or 
other such temperature sensors are connected to the susceptor 24 as previously described 
15 in Figure 1. The temperature sensors 180, 182, 184, 186 are each connected to a data 
bus via input/output devices such as buffer amplifiers and analog-to-digital (A/D) 
converters 188, 190, 192 and 194. The temperature sensor input/output devices 188, 
190, 192 and 194 are preferably housed in a temperature data acquisition assembly 172 
and are located in the vicinity of the reactor 20 to minimize measurement error. The 
20 outputs of the A/D converters 188, 190, 192, 194 are connected to a data bus 195 which 
in turn connects to an input/output port 167 of the system temperature controller 170. 
The temperature controller 170 comprises a processor 165, a data storage device 169, 
and data input/output devices 167, 168 which provide hardware/software 
implementation of the foregoing model-based predictive control algorithm. The output 
25 of system controller 170 are connected to a plurality of lamp drivers 174 via a data bus 
198 and provide the lamp drivers with their respective control signals tT(t). As 
previously mentioned, the plurality of lamp drivers may comprise a bank of SCR p wer 
regulators configured in a predetermined manner to supply electrical power to the 
plurality of lamps in reactor 20. Preferably, the SCR's and lamps are connected to 
30 supply radiant energy to the plurality of reactor heat zones in accordance with the 
preferred radiant heat distribution within the reactor 20. The lamp driver outputs P(t) 
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200 are connected to the lamps in accordance with this plan, thereby completing the 
temperature control loop. 

In operation, the temperature sensors 180,182, 184 and 186 provide analog 
signals indicative of the wafer temperature in respective zones center, side, front and 
5 rear. As shown in Figure 7, the analog signals are filtered (buffered) and converted to 
digital signals by the respective A/D converters 188, 190, 192 and 194. The digitized 
temperature information Y(t) is transmitted via the data bus 196 to the system controller 
170 which computes the optimal control strategy lT(t) using the foregoing model-based 
predictive control algorithm and dynamic system model. The information necessary for 

10 future processing, namely Y(t) and U*(t), is retained in the controller data storage device. 
The system controller 170 transmits the control input lf(t) via the data bus 198 to the 
lamp driver assembly 174 whereupon the control signals LT(t) are distributed to the 
appropriate SCR packs 171, 173, 175. The SCR's convert the control signals U*(t) to the 
lamp drive signals P(t) as previously discussed in connection with the prior art system of 

15 Figure 2. The lamp drive signals P(t) are transmitted to and distributed among the lamp 
banks in reactor 20 via the bus 200. The lamp banks and lamp drive signals are 
configured spatially and temporally, in part by the temperature controller 170, to provide 
a predetermined spatial and temporal temperature profile over wafer 22. 
REACTOR MODEL IDENTIFICATION AND PARAMETERIZATION 

20 The present section discloses exemplary identification and modeling procedures 

in order to arrive at a model that accurately describes the dynamics of a multivariable 
rapid thermal reactor. The ensuing model resides at the core of the model-based 
predictive temperature control system of the present invention. The test 
arrangement and conditions are first described, after which the model structure and order 

25 selection procedures are discussed. The model is then presented along with exemplary 
model validation. 

Modeling and Identification 

For modeling and identification, a PC-based Data Acquisition and Control 
(DA&C) system (not shown) is connected to the RTCVD reactor. A software based 
30 system is used to provide the interface between the DA&C hardware and the user. The 
PC is used to control the temperature in the reactor, for example, by using a 
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conventional software-based PID algorithm. The DA&C system is also capable of 
injecting stimuli, in the form of appropriate test signals, into the system in open-loop 
mode and detecting the response of the temperature sensors. This open-loop mode 
comprises a substantial portion of the system operation during the identification 
5 experiments. The inputs to the system, such as SCR drive signals, and the outputs, such 
as thermocouple readings, are stored in a data file. Analysis of the signals and modeling 
are performed off-line using software-based analysis familiar to those skilled in the art 
of model identification. The identification experiments will result in a model for the 
transfer function from the four control signals for the center, front, side, and rear zones 

10 to the center 44, front 46, side 48, and rear 50 thermocouples. 

Identification experiments on the RTCVD reactor are conducted at atmospheric 
pressure and at a temperature between 600°C-800°C, which is a typical temperature 
range for polysilicon deposition. The controller zone ratio settings are optimized for 
steady-state uniformity at 650°C and are maintained constant during the experiment. 

15 The system is set for 6" wafer processing. A nitrogen purge flow of 20 slm is used 
throughout the experiment. Identification experiments are also performed in H 2 
ambients both at 1 atm and reduced pressure at about 200°C for typical epitaxial 
deposition conditions. The lamp-bank configuration may be adjusted and in general 
may differ from that previously shown in Figure 2 in terms of zone distribution and 

20 lamp power. Those skilled in the art of reactor design will appreciate that a variety of 
lamp bank distributions are possible. In particular, an exemplary lamp distribution may 
have all lamps operating at the same nominal power rating of between 3kW and 7kW, 
with some modification in the distribution of SCR lamp drivers to lamp heating zones. 
Additionally, the SCR/lamp wiring may differ between zones to facilitate the power 

25 distribution between lamps. The preferred lamp bank distribution, power and wiring 
will in general depend on the desired thermal processing and reactor geometry. For the 
purposes of the present preferred embodiments, the preferred design criteria result in a 
lamp bank configuration having better controllability of the peripheral zones and having 
reduced temperature differences across the wafer as well as between wafer and 

30 susceptor. 
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Careful experimental design for dynamic system identification is paramount to 
obtaining a good model. Several design variables must be considered: the type and 
shape of input signal, its spectrum, the sample rate, the number of samples, and the 
antialiasing presampling filters. Essentially, the experiment must be designed such that 
5 it is informative, i.e., that it provides the experimenter with the desired information 
about the system. For an experiment to be informative, the input stimuli must be 
persistently exciting. Basically, this means that the input signals must have enough 
spectral content to excite all relevant modes of the system. A detailed treatment on 
system identification and experiment design is provided in L. Ljung, System 

10 Identification: Theory for the User. Prentice-Hall, Englewood Cliffs, New Jersey 
(1987). Classical system identification makes use of step-signals, pulses or sine waves 
as test signals for identification purposes. The modem equivalent of these signals for 
identification of multivariable systems is the Pseudo-Random Binary Signal (PRBS), 
having a signal level that alternates between two levels at random times. In the 

15 exemplary test shown here, the PRBSs are allocated peak-to-peak amplitudes of about 
1.5 V in order to provide sufficient system excitation. Mean signal levels are chosen to 
correspond to the steady-state controller output voltage levels corresponding to a 
temperature of about 650°C. The sampling rate is taken to be about 0.5 Hz. A one-hour 
run is recorded. The resulting data set is split in two, the first half being used for 

20 identification purposes and the second half for model validation purposes. DC-offsets 
are eliminated from all input and output signals. 

An exemplary input/output identification data set for the center zone is shown in 
Figures 8A and 8B, showing the first 200 seconds of system stimuli (Figure 8B) and 
response (Figure 8 A). Corresponding identification data sets for the front, side, and rear 

25 zones are obtained in the same manner and display substantially similar characteristics. 
Reactor Model Structure 

Once the identification data set has been collected, the next step is to choose a 
model structure. Generally, this involves three steps: 

1 . Choosing the type of model set (e.g., linear or nonlinear, input-output, 
30 black-box or physically parameterized state-space models). 
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2. Choosing the size of the model set. This is called the model-order 
selection and determines the number of free parameters in the model 
description. 

3. Choosing the model parameterization. That is, choosing the positions of 
5 the free parameters in the selected model structure. 

The choice of model structure will likely involve a trade-off between flexibility 
and parsimony. A higher-order model will be more flexible, but may lead to 
unnecessarily many parameters being used to describe the true system. Moreover, a 
high-order model is more difficult for on-line use in a model-based controller. The 

10 principles and guidelines for system modeling are well-known to those skilled in the art 
of system control. Again, for a more in-depth treatment of the topic of model structure 
selection one is referred to Ljung (1987). 

As described above in Section HLD., the present embodiment of the multi-input, 
multi-output, model-based predictive controller utilized a multi-input, multi -output 

1 5 polynomial model in an auto-regressive moving average representation in Equation ( 1 5). 
The model is advantageously considered as a set of coupled linear multi-input, single- 
output polynomials which allow convenient description of the filter process signals (Y ^ 
and (U^j (see Equations (17) and (18)). 

The exemplary model parameters provided in Table I below refer to a multi- 

20 input, multi-output (MIMO) 3rd order polynomial model coefficients defined by 



with 

A, 
Pi 

25 Cj 
for all j. 



n = m = 4, and 

1, 

(l-Cq'Xl-C.q 1 ) 
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0.8520], 



In the present exemplary system, i and j may correspond to the zone number (i.e., 1 = 
center, 2 = front, 3 = side, 4 = rear). 

10 Reactor Model Validation 

Once a model structure has been selected and a parameterization has been found, 
the proposed model is preferably validated. Standard techniques for model validation 
include simulation, residual analysis, and cross-correlation analysis. 

In simulation, usually a fresh data set is used, i.e., data from the real system that 

1 5 was not used in the identification phase. The model is fed with the same inputs as the 
actual system and a comparison is made between model outputs and system outputs. 
Such an exemplary comparison is made in Figure 9, again for the center zone, using the 
data of the last 30 minutes of the experiment which were not used for model building 
purposes. In Figure 9, both model output 302 and system output 300, in this case the 

20 center thermocouple reading after subtraction of the steady-state value, are plotted 
versus time (measured in samples, where the sampling interval is a fixed time interval). 
A measure of fit is derived from curves 300 and 302. The curves shown have a mean- 
square-deviation of about 3.5, where a lower value indicates a better fit. Corresponding 
validation for the front, side, and rear zones should obtain substantially the same degree 

25. of fit. 

Residual analysis is used to check whether there is any structural information left 
unexplained by the model. Ideally, the residuals (difference between model predictions 
and system output) should be white or random with time and independent of the inputs 
for the model to correctly describe the system. The curve 304 in Figure 1 0 shows the 
30 correlation function of the residual for the center zone output for time lags up to 25 
sampling intervals. Dotted lines indicate 99% confidence limits, assuming the residuals 
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are indeed white. Cross-correlation between system inputs and residuals should also 
show a zero mean with an RMS deviation staying well below the 99% confidence. Such 
behavior, as indicated by a curve 306 in Figure 10, should be observed for all cross- 
correlated quantities, which indicates there is no significant systematic unaccounted 
5 input/output correlation. 

As a final test for the model validation, the model is used to predict 
thermocouple readings using information on past inputs and outputs. A fresh data set, 
as used in Figure 9, is also used in the present comparison shown in Figure 11. 
Figure 1 1 shows the system output (center thermocouple) and the one-minute ahead 
10 predictions of the system output made using the model predictor. Notice that the 
predictive capabilities of the model are excellent. Prediction results for the front, side, 
and rear zones (not included) show similar behavior. 

Using identification and verification techniques described herein, the model 
described above has been found to provide a very accurate description of system 
15 dynamics for an exemplary RTP reactor at atmospheric pressure and in a temperature 
range of 600-800°C. The ARMAX model is shown to have predictive capabilities 
particularly advantageous for the present preferred embodiment of a model-based 
predictive controller. The look-ahead feature of the model can be used, for instance, to 
minimize overshoot, thus improving recovery time and minimizing recipe cycle times. 
20 It will be appreciated that the precise form of the model can vary appreciably without 
departing from the spirit and scope of the present invention. In general, the model form 
will be dictated by demands on a variety of factors including flexibility, accuracy, 
sensitivity, robustness and speed. One alternative preferred embodiment is to reduce the 
model order for minimizing computational overhead, without significant loss of 
25 accuracy. Additional preferred embodiments comprise: 

— Extending the predictive controller to include adaptive behavior, 
whereby model parameters are themselves subject to real-time assessment and 
modification. 

— Utilizing constraint input optimization. The optimal control strategy 
30 (29) does not take into account constraints on input energy to the system 

(linearity assumption). This may lead to less-than-optimal behavior during fast 
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heat-up ana cool-down. This situation is improved by checking the proposed 
control moves for constraint violations. If a control move violates a constraint, it 
is set to the limit value and the remaining "free" future moves are recomputed. 
This process is iterative and ends when all future moves are at their limit value 
5 or an iteration no longer adds new constraint moves. This simple new technique 

is substantially easier to implement than the conventional quadratic 
programming solution. 

— Extending the linear model to a nonlinear model, preferably by 
utilizing neural networks to model the static gain (nonlinear) in series with the 
10 ARMAX model. 

Reactor Testing 

As previously seen, a preferred embodiment of the dynamic system model is 
capable of tracking and predicting the dynamic behavior of multiple heat zones within 
reactor 20. Likewise, a preferred multivariant temperature control system of the present 

15 invention is capable of maintaining a predetermined temporal sequence of temperatures 
for each heat zone of the reactor 20 as exemplified by Figure 12A. The solid curves 400, 
402, 404, 406 of Figure 12A indicate the temperature set-point sequence to be followed 
by independent heat zones: center, side, front and rear respectively. The dashed curves 
401, 403, 405 and 407 are the respective temperature profiles followed by the center, 

20 side, front and rear heat zones as a result of action by the temperature controller 1 70. 
Time lag between zones is substantially elin nated due to the predictive action by 
controller 170 operating on all zones in parallel. Furthermore, temperature differences 
between zones, as intentionally programmed in Figure 12A, become a relatively simple 
matter of zone-to-zone offset control. As shown in Figure 12B ? the temperature 

25 controller 170 supplies the plurality of SCRs with drive signals appropriate for the 
respective heat zones at a given time. The curves 410, 412, 414 and 416 correspond to 
the center, side, front and rear SCR drive signals respectively. Thus, while ttv jmporal 
setpoint sequence and actual temperature profile is qualitatively similar for e^.i of the 
four heat zones (Figure 12A), the SCR drive signals for each zone display very different 

30 behavior as determined by the temperature controller 170. 
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An exemplary demonstration of predictive control versatility is seen in Figure 
13 A, wherein each zone separately is provided with a temperature step sequence, 
initially positive then negative. As seen in Figure 13A, initially the center zone (1) is 
programmed for a positive temperature excursion, then a negative temperature 
5 excursion, followed in succession by the side (2), front (3) and rear (4) zones. The 

controller 170 provides the necessary control signals concurrently to all four zones such 
that each zone, independently, maintains the programmed temperature profile. Note that 
while a specific zone is ramped up or down, the other zone temperatures are 
substantially unchanged, indicating the substantially complete decoupling of heat zones 

10 as a result of the model-based predictive control. As shown in Figure 13B, the 

exceptional temperature control displayed by the preferred embodiment is also manifest 
in the control signals. To account for the strong thermal coupling between zones, the 
controller compensates by driving each zone with a signal appropriate to maintain the 
prescribed temperature profile, both spatially and temporally. Evidently, the model- 

15 based predictive control system of the present invention, implemented in a rapid thermal 
process reactor, substantially optimizes process cycle time as well as spatial temperature 
uniformity. 

Detailed Description of the Nonlinear and Neural Network Embodiments 

20 

OVERVIEW OF NONLINEAR RTP PROCESS CONTROL 

In yet another embodiment of model based-predictive controllers, the linear 
model disclosed above can be further enhanced by using a nonlinear model of the 
process reactor. A preferred method for implementing the nonlinear model involves the 
25 use of neural networks. A preferred embodiment of the neural network based nonlinear 
predictive controller is a Neural Extended Prediction control (NEPco) neural model 
based predictive controller for the susceptor temperature control of the ASMA reactor. 

Figure 14A is a block diagram that illustrates a fabrication system 1400. A 
recipe block 1401 provides input into a NEPco process block 1402. The NEPco process 
30 1402 outputs control signals to one or more SCR's that operate one or more lamps 1403. 
The lamps 1403 provide heat to a reactor 20 which is represented by a reactor process 
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block 1404. A group of immeasurable outputs from the reactor process block 1404 are 
the wafer surface temperatures 1405. A group of measurable outputs from the reactor 
process block 1404 are the susceptor temperatures 1406. The susceptor temperatures are 
fed back into the NEPco process block 1402 to facilitate temperature control of the 
5 wafer 22 and the susceptor 24. 

The temperature of the wafer surface is of major importance for the deposition 
process. However, the wafer temperature is not measured during normal operation. The 
only signals which are directly measured for control purposes are the susceptor 
temperatures. Experiments have indicated that these susceptor temperatures provide a 
1 0 reasonable approximation of the unknown wafer temperature distribution. Experimental 
results indicate that good susceptor control alone is not sufficient to obtain very tight 
wafer control. 

The NEPco embodiment of the present invention discloses a procedure for 
improved control of the susceptor temperature signals 1406. This improvement 

15 provides the immediate benefits of improving the temperature control of the susceptor 
24 and therefore the wafer 22, and it sets the stage for improvements using various 
models based on the soft sensor principle. 

Figure 14B illustrates an overview of the hardware, software, and conceptual 
components that comprise the system 1400. The reader is urged to refer back to Figure 

20 14B before reading each section below in order to place the section about to be read in 
context. Figure 14B shows a three layer structure of elements that comprise the system 
1400. Lower levels in the structure represent, at greater levels of detail, the internal 
elements of the upper layers. A controller system layer 1410 comprises the system 1400 
and is the topmost level of the system 1400. Working downward, the next level is the 

25 predictive modeling level 1411 which comprises a predictor process 1500, a series 
parallel predictor 1801, a parallel predictor 1800, and a neural network 1600. The 
lowest of the three levels is a training layer 1412 which comprises apseudo least squares 
(PLS) block 2300, a pulsetest experiment block 1900, and an initial estimate block 
2400. 

30 Returning to the predictive modeling layer 1411, the predictor process 1500 is 

shown as being part of the NEPco process block 1402. The series-parallel predictor 
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1801 and the parallel predictor 1800 are shown as being different implementations of 
the predictor process 1500. A unit step response 2100 is shown as being an internal 
component of the parallel-predictor 1800. The neural network 1600 is shown as being a 
part of the parallel predictor 1800. 
5 Returning to the training layer, the PLS training method block 2300 is shown as 

applying to the neural network 1600. The pulsetest experiment block 1900 and the 
initial estimate block 2400 are shown as being inputs to the PLS training method block 
2300. 

THE NONLINEAR PROCESS MODEL 

10 Figure 15 illustrates a block diagram of the nonlinear process model 1500. A 

process input u(t) 1501 is the sole input to a model process block 1502. The process 
input 1501 appears in the equations as u(t) and is typically a voltage to the lamp driver 
SCRs. The model process block 1502 exhibits a nonlinear transfer function^...). A 
model output x(t) 1504 is an output of the process block 1502. The model output x(t) 

1 5 1 503 appears in the equations that follow as x(t) and is typically a temperature expressed 
in °C. The model output x(t) 1503 and a process disturbance n(t) 1503 are added 
together at a summing junction 1506. The output of the summing junction 1506 is a 
process output y{t) 1505. The process disturbance 1503 is expressed in the equations 
that follow as n(t) and is typically expressed as a temperature in °C. The process output 

20 1505 is expressed in the equations that follow as y(t) and is typically the susceptor 
temperature measurements expressed as a temperature in °C. Thus, as shown in Figure 
15, the process output 1505 can be expressed mathematically as y(t) = x(t) + n(t) . 

The process disturbance n{t) 1503 includes all effects in the process output y(t) 
1 505 which do not come from the model output x(t) 1504. The process disturbance n(c) 

25 1503 is a fictitious (immeasurable) signal which includes such disturbance effects as 
deposition, gas flow, measurement noise, model errors, etc. These disturbances 
typically have a stochastic character with nonzero average value. The disturbances can 
usually be modeled by a colored noise process given.by: 

C(<?' ! ) 

n(t) = D(^) e{t) (31 > 

where: 
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e(t) = white noise (uncorrelated noise with zero mean value) 

C(q x ) = l + c i q x = .. + c n ,q«' . (3 

D(q A ) = J + d / q l +...+d nd q" , < 

As in the linear case, q' x is the backward shift operator where q~ n s{t) = j(/-/7) and s(r) is a 

time-dependent signal were t denotes a discrete time index (/=0, 1,2,...)- The filter 

C{q- l )/D(q~ l ) is a disturbance model. While many acceptable disturbance models are 

5 possible, in the preferred embodiment for the ASMA application it has the structure: 



C(q') = (l + cq*) 2 
D{q A ) U + dq l )(I-q x ) 

where c and d are design parameters (preferred values are: c=d=0). 

The model output x(t) 1504 represents the effect of the power input u{t) 1501 on 
the susceptor (thermocouple) temperature. This is an immeasurable signal, as only the 
combined effect of control action plus disturbances is measurable via the thermocouple 

10 sensors 44, 46, 48, and 50. 

The relationship between the input u(t) and the output x(t) is a dynamic 
relationship in that the present temperature x(t) does not depend on the present input 
u(/), but on the previous temperatures {*(M), x(t-2) t ...} and the previous inputs {u{t-\), 
u(/-2), ...}. Moreover, experimental measurements show that for a typical ASMA 

1 5 reactor, the relationship between u(t) and x(t) is also strongly nonlinear. For example, in 
one experiment the effect of a specific power input variation on the resulting 
temperature was found to be quite different around 800°C as compared to 1 100°C. 
These temperatures are by way of example only since different reactors will exhibit 
different properties. 

20 The effect u(t) x(t) can thus be represented by a nonlinear dynamic model 

where the transfer function^... J 1502 is an unknown nonlinear function, such that: 

x(t) = f[x(t-l), x(t-2), .... w(/-/), u(t-2), ...]. 

In the preferred embodiment, the functiony[. is implemented as a neural network. 
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Figure 16 illustrates a typical typical neural network. In Figure 16, the set of 
past model outputs 1604 {x(t-l), x(t-2\ ... } and the set of past model inputs {u(t-\), u(t- 
2), ... } are shown as inputs to a layer of input neurons 1601 . The input neurons 1601 are 
connected to a layer of hidden neurons 1602 such that every one of the input neurons 
5 1601 is connected to every one of the hidden neurons 1602. The hidden layer 1602 
contains three hidden neurons 1610, 1611, and 1612. The hidden neurons 1602 have 
outputs labeled z, . ..z, . . . such that z, is the output of the first hidden neuron 1610 and 
z„ is the output of the last hidden neuron 1612. The connections between the input 
neurons 1601 and the hidden neurons 1602 are labeled w]J ] where i indicates the hidden 

10 neuron having the output z/ and j indicates which of the input neurons 1601 is being 
connected. The superscript [1] indicates the connection starts from the first layer of 
neurons. All of the hidden neurons 1602 are connected to an output neuron 1613 by 
connections labeled w} 2] where i indicates the hidden neuron output z/ that is being 
connected to the output neuron 1613. The superscript [2] indicates the connections from 

1 5 the second layer of neurons of neurons. 

The input neurons 1601 are non-active neurons in that the neurons do not 
perform any computation, they only distribute the input signals to the hidden neurons 
1602. In the preferred embodiment of the ASMA application, a third order model is 
used, meaning the six input neurons 1601 corresponding to the three previous values of 

20 xft), namely x{t- 1 ), x(t-2) and x((-3), and the three previous values of u(t), namely u(t-\ ), 
u(t-2). and w(/-3), are provided as inputs to the input layer 1601 . 

The hidden layer preferably contains nonlinear sigmoid-type neurons. Sigmoid 
neurons are well known in the art (see e.g., James A. Freeman and David M. Skapura, 
tr Neural Networks* 1 Addison Wesley, 1991). The hidden neuron outputs z,- are computed 

25 as follows: 

Where I is an input vector given by: 

1= [x(t-l) jc(/-2) jc(/-3) w(r-l) u(t-2) u(/-3)] T 
and W. [1J is a weight vector given by: 
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= M!» wiv v^ 1 h*V wjy] . 

The function is a sigmoid function shown graphically in Figure 22 and 
given mathematically by the equation: 

\-e u 2 

s(x) = ITT 57 = iTT 37 

5 The parameters in the weight vectors W/ ,J (i = L..n) and the biases bp ] (i = l...n) 

are unknown and must be estimated from experimental data during the training of the 
neural net. The biases are used to compensate for an offset in the process model. 
The offset arises from the fact that, in reality, the output x(t) is not necessarily zero when 
the input u{t) is zero. 

10 Figure 19 shows a simple neural network 1900. The simple neural network 

1900 comprises a single hidden neuron 1904 of the sigmoid type. The hidden neuron 
1904 has a group of inputs 1901 comprised of model outputs -r,(M) f *,(r-2), and x x (t-2). 
The hidden neuron 1904 also has a group of model inputs 1902 comprised of model 
inputs u,(M), «,(f-2), and w,(>3). The hidden neuron 1904 also has a group of model 

15 inputs 1903 comprised of model inputs w 4 (M), w 4 (/-2), and u 4 (/-3). Figure 19 further 
illustrates that the hidden neuron 1904 has inputs comprised of model inputs u 2 (t-\), 
« 2 (/-2), u 2 (t-3\ u 3 (M), h 3 (/-2), and u 3 (f-3). An output of the hidden neuron 1904 feeds a 
linear output neuron 1905. The neural network 1900 has a single output x x (t) 1906. 

The most simple neural net has only one neuron in the hidden layer 1602 (w=l) 

20 and thus only one output z t . It was found experimentally that the simple neuron network 
1900 (where n-l) is a good choice for the ASMA application: Additional hidden 
neurons provide improvement of the control performance, but the computational load 
and the modeling effort both increase dramatically. 

The output layer contains the single linear output neuron 1613. The output of 

25 the output neuron 1613 is computed as follows: 

x= W* 21 -Z + 6 (2J 

(33) 

where Z = [z, z 2 ... z, ... z„J 



and W l2 > = [vv| 21 w r p ... w] 2] ... w[ 2] ] 
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For the ASMA application with only one neuron in the hidden layer (n=l), 
equation (33) reduces to 

x = vv^ z + b [2] 

The weight and bias of the output neuron should be identified together with 
those of the hidden-layer neuron. In fact, all of the weight and bias parameters together 
5 constitute the model of the unknown process dynamics. 
THE NONLINEAR MULTISTEP PREDICTOR 

As in the linear case, the notation y(t+k \ t) denotes the predicted value of y(t+k) 
at time r, for k = \...N 2 where N 2 is the prediction horizon,. Thus, y{t+k \ t) is based on: 

• measurements available at time /, i.e., {y(t),y{tA\ u(t-l), u(t-2). ...}/ and 
10 • future (postulated) values of the input {u(t j t), w(M | /), ...}. 

In other words, the notation (... | r) means 'postulated at time t\ Using the process 
model 1500, from Figure 15, it follows that: 

y(t + k\t) = x(t + k\t) + n(t+k\t) 

The Method 

Of the many possible configurations known in the art for recursion of a 
15 nonlinear network model, the two most preferred configurations for modeling the 
ASMA reactor are a parallel model and a series-parallel model. There is no requirement 
that the nonlinear model 1 502 be based upon a neural network. However, the preferred 
embodiment uses a neural network. For convenience and clarity of presentation herein, 
the model will assumed to be implemented using neural network, with the 
20 understanding that other (non-neural network) implementations are possible. 

Figures 17A and 17B show block diagrams of two common recursion networks. 
Figure 17A is a block diagram of the parallel model network. In Figure 1 7 A, the model 
1701 is shown as a neural network (NN) process block with an input vector 1707 and a 
single output jc(/+£) | /) 1704. The input vector 1707 has a group of inputs 1702 
25 comprising model outputs 1504. The model outputs 1504 comprise (x{t+k-\ \ t), x(t+k-2 
| and *(/+/c-3 | r). The input vector 1707 has a group of process inputs 1703 
comprising process inputs 1501. The inputs 1501 comprise (u(t+k-\ ! t), u(t+k-2 \ t), 
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and u(t+k-3 | /). Figure 1 7B shows the series-parallel model neural network as an NN- 
block 1751, which is a process block with an NN-input vector 1757 and a single output 
x(t+k) | t) 1754. The NN-input vector 1757 has a group of inputs 1752 comprising 
process outputs 1505. The inputs 1505 comprise (y<7+£-l I 0. y{t+k~2 \ t). and 3 | 

5 t). The NN-input vector 1757 also has a group of inputs 1702 comprising process inputs 
u{t+kA | 0. u(t+k-2 | 0, and u(/+A-3 1 r). 

The parallel model, also known in the art as the independent model, preferably 
should be used only for stable processes. The series-parallel model can also be used for 
unstable processes. To obtain similar control performance with both models, the 
10 disturbance model C{q x )fD{q x ) should be chosen differently. Both models axe useful 
for the the ASMA application; however, the parallel model is preferred and so it is 
described in greater detail herein. 

The Parallel Model: Prediction of x(t+k\t) 

At each sampling instant f, the recursion is started with k=0 and x(t \ t) is 
1 5 computed using the NN input vector 1 707 [x(t - 1 ) x(t - 2) x(t - 3) u{t - 1 ) u(t - 2) u{t - 
3)], which contains values from the past, thus known at time /. Notice that x{t) = x (t\t) 
and that this value can be saved in the database for further use at future sampling 
instants. 

Then for k—l, the previously computed x(t \ t) is used at the NN input to 
20 compute x(t+\ \ r), etc. Notice that x(r-H)* x{t+\ \ t) y but x(/+l) = x(t+\ \ f+1). The 
value jc<7+1 | t) can thus be discarded after time /. The recursion is restarted at each 
sampling instant, because x(t+k \ x(t+k \ t) for k>0. Indeed, *(... | f-H) is 

computed based on information available and postulated at time /+1 while | t) is 
based on information that was available and postulated at time r. This information is 
25 different, as the knowledge base is updated at every sampling instant with new 
information coming from the sensor data. 

The Parallel Model: Prediction of Mt+k 1 t) 

At time r, using the data [x(t - I), x(t - 2). x(t - 3), u{t - 1), u(t - 2), u(t - 3)], x(t) 
is computed using the NN-model 1701 . Using the measured value y(t), the current value 
30 of the disturbance n(t) 1503 is computed using the process model: n(t)=y(t)-x(t). 
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Notice that the previous values of n{t\ namely {n(M)> n(t-2),...} are available in the 
computer memory. 

The filtered disturbance signal 

D(q x ) 

5 is computed using the difference equation: 

n f (t) = -c r rif(t-\) - c 2 -n f (t-2)-...+n(t) + drn(t-\) + d 2 'n(t-2)+... 
Since the disturbance model is: 

C(q l ) 
n(t) = — ~~r eft) 
D(q ) 1 

then the signal n/fy^etj). As white noise is, by definition, uncorrelated, the best 
10 prediction of the white noise is the mean value, which is zero. Thus: 

n f (t + k\t) m 0, k = 1„.N 2 
The best prediction of the disturbance is obtained from: 

«(' + *IO = |^ «/(/ + *IO 



15 



which can be computed using the difference equation: 

n(t + k\t) = -d r n(t + k-\\t) - d 2 - nft + k - 2\t)~ ... 

+ n f (t + k\t) + c r n f (t + k-\\t) + c ; + * - 2|/; + ... 

The recursion goes from fc=l ... M. The recursion starts with k=\. The signal 

values in the right-hand side, namely n{t\t), n(t-\\t), nj(t\t) t n/j-\\t), ... are known, 

while nj(t+\ |/)=0. The computed value *(/+lj/) is then used in the right-hand side, 

together with nj(t+2\t)=Q in order to compute n(t+ 2|r), etc. 

20 The Algorithms 

Figure 18 is a flowchart that illustrates the process for computing a new 

set of predictions for n(t+k\t), u(t+k\t) 9 mdy(t+k\t) at each time-step t. 

(1) Measure y{t) at a process block 1801 and store the data in a database 

containing \y{t), y{t-\\ ... }. 

25 (2) Measure u{t- 1) at a process block 1 802 and store in a database containing 

MM), u(t-2), .,.}. 
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(3) Postulate a future control policy {u(t\t). m(/+1|/) u(t+N 2 \i)} in a 

process block 1803. 

(4) In a process block 1804, compute the current model output x(t): 

x(i\t) = s(W [x] -l + b [l] )'iJ 2] + b [3] 
5 where s(. . .) denotes the sigmoid function; 

I = [jc(t-1) x(t-2) x(t-3) u(t-l) u(t-2) u(t-3)] T ;and 

W x] = [w, llJ w ^ wj" W'l w 6 t |J ], w^, # 2J are the NN weight and 
bias parameters. Notice that x(t\t) is not really a prediction because it 
depends only on past values and not on the future control inputs, so x(t\i) 
1 0 = x(t). The value x(t\t) is saved in a database containing {x(t) f x(t- 1 ), x(t- 

2),... } because it is used again at the next sampling instant. 

(5) Compute n(t) = y(t) - x(t) in a process block 1805 and save the value 
in a database containing {n(t), n(M), n(t~2) t ... }. 

(6) In a process block 1806, compute the filtered disturbance signal /?//) 

15 from: 

rif(t) = -crn f (t-l) - c 2 n f (t-2) - ... 

+ n(t) + d r n(t-l) + d 2 *n(t-2) +... 

and save in a database containing {nj(t\ /i/M). nfit-2\ ... } . 

(7) In a process block 1807, reset the prediction values 
n f (t + \\t) = n f (t + 2\t) - ... = n f (t + N 2 \0 ^0. 

20 (8) In a process block 1808, compute the predictions n(t+\\t). n(t+2\t\ 

n(r+M|0 from: 

«(/+l|0=^r*K0^i-«(^n---- + «/' + l|0 + Ci "/0+c 2 «/M)+... 
*(f+2|f)=-J,-/i(r+l|0-^ 



/i(f+N 2 |f)=-rf,-ii(/+^ 2 -l 10-4 ^+^-21/)-... 
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(9) In a process block 1809, compute the predictions *(r+l|/), x(/+2|/) 

x(t+N 2 \t) from: 

x(t+\\t) = sCW l -l+b l )W*+bP 
with I = [x(r)x(M) x(t-2) u(t\t) u(M) u(r-2)] T 

x(/+2|r) = j (W 1 1+6') w^W> 
with I = [x(t+l\t)x(t)x(t-l) u(r+l|/) w(r|r) w(/-l)) T 

with I = [x(t+N 2 - 1 10 ... u(t+N r 3\t)] T 



Note that all data indicated with (...[/) can in principle be discarded after time t 
5 because these data depend on information available at time t and are recomputed at 
every sampling instant, after new measurement information is obtained. 
THE NONLINEAR SISO PREDICTIVE CONTROLLER 

As with the linear case, the single input, single output (SISO) controller will be 
discussed first because it is simpler than the more general multiple input, multiple 
10 output model, and yet illustrates the basic principles. Figure 20 illustrates the 
waveforms in the SISO controller for a = 0 (defined belw). Figure 20 shows a two-axis 
plot having an x-axis 2001 showing, having a y-axis 2002 showing a curve 2003 
representing u, a curve 2004 representing >», and a horizontal line 2005 representing the 
curve w/r. The y-axis 2002 is positioned on the x-axis 2001 at time /. Therefore, time 
15 values on the x-axis 2001 that lie to the right of the y-axis 2002 represent the future, 
such as u(t+k \ t ). Similarly, points on the x-axis 2001 that lie to the left of the y-axis 
2002 represent the past. 

The ultimate objective of the SISO controller is to find the control input u(t\t) 
which minimizes the cost function: 

20 J = £ [r(t + k\t) - y(t + k\t ] 2 + X £ [ &u(t + k\t)J 2 
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where: 

Au(r+A|r)=u(r+/:|r)-M(/+/c-l|OsO for k>N u ; 
r(f+*|0= ar{t+kA\t)+{\-ay\v{t+k\t) t for k=\„.N» ;and 

K'l')=><0. 

5 The design parameters are and preferred values are: 

N 2 = the prediction horizon (preferred values = 3... 9) 
N u = the control horizon (preferred value = 1 ) 
N { .„N Z = the coincidence horizon (preferred values = 1 ...N 7 ) 
• X = the weight parameter (preferred value = 0) 
10 • a = the filter parameter (preferred value = 0) 

Free response and Forced Response . 

Conceptually, the future response y(t+k\t) can be considered as the sum of two 
separable effects, namely the free response and the forced response, where: 

y{t+k\t)=yfr ee (t+k\t)+y fo rce^ 
1 5 The free response yfree(t+k\t) is a direct result of: (1 ) the effect of past control {u(t- 1 ), 
u(r-2),...} as if {Au(t\t) = Au(t+\\t)=.=Au(t+N u -\\t)=0} or {u(r|r)=u(M),u(r+l |0="('- 
1), ... }; and (2) the effect of future disturbances n(t+k\t). The free response yfree(t+k\t) 
can be computed with the procedure described in Figure 1 8, with 

u{t\t)^u{t^\\t)^...^u{t^N 2 \tyu{^y 
20 The forced response yf 0 rced^k\t) is a direct result of: (1 ) the effect of future 

control actions {Au(t\t),Au(t+\\t),...Au(t+N u -\ |r)}. In the preferred embodiment, the 
forced response y forced* W) is the effect of a sequence of step inputs 1920 having: 

(1) a step with amplitude Au(t\t) at a time r, resulting in a contribution 
gk&u(t\t) to the predicted process output at time (t+k) (=k sampling periods 

25 later); 

(2) a step with amplitude Au(r+1 |r) at time (/+!), resulting in a 
contribution 

g*.,Au(/+l \t) to the predicted process output at time (t+k) (k-\ sampling periods 
later); 

30 (3) etc., such that the total effect is: 
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y fo^ 0 + tc\t) = g k Au(t\t) + gi _ t Au(t + 1|0 + . . . + gi ^Au(t + N u - 
The parameters g, , ^ ...» #t, g?/ are the coefficients of the unit step response 

of the system. Where the unit step response is the response of the system output for a 
stepwise change of the system input (with amplitude 1). For a nonlinear system, such as 
a NN, the unit step response is different for each operating point. Thus it should be 
computed at each sampling instant by applying a fictitious stepwise change to the 
current process input 1501 and computing its effect on the process output 1505, using 
theNN-model 1701. Finally, note that - =0. 

In expanded matrix notation, the forced response is expressed as: 

g 9 0 0 ... 0 
g 2 g, 0 0 



>' force* fl + V\0 



' forced 



(t + 2\t) 



_y f or«4(' + N 2 \t)_ 



Au(t + J\t) 
Au(t + N u -I\t) 



_& N} Sni-I &N>-2 •■■ 8n2-N„~1_ 

Now changing notation for simplicity let y(t + k\t) - y fret {t + k\t) then: 



r(t+N,\t)-y(t + N,\t) 



rft+NAO-Pft + NAO 



r(t + N 2 \t)-y(t + N 2 \t)j \_r(t + N 2 \t)-y(T+ N 2 \t) 

g N , 8n,-i ••• T Am(/|0 

Au(/+l|0 



g*, g Nl -t Sn,-i g Ul . Ktl ] L Au(t + N u -l\t) 

or using compact matrix notation: 

(R - Y) = (R - Y) - G U. 
With this notation, the cost function becomes: 

(R - Y) T (R - Y) + XU r U = f(R - Y) - GV ] T [(R - Y) - GUJ + X U r U 
Minimization with respect to U gives an optimal solution: 
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UWG r G + A.I/ , -G r fR-Y; 

where I is the identity matrix. 

The following comments are in order. First, only the first element, Aw(rjr), in 
U* is required to compute the control input u(t)=u(t-l)+&u(t\t). At the next sampling 
5 instant (/+1), the whole procedure is repeated taking into account, the new measurement 
information y(t+l). This is called the "receding horizon" principle of MBPC Second, 
the matrix [G T G+XI] which must be inverted has dimension N u x N u . For the default 
case, where N tt =l, this results in the scalar control law: 

ii&-[K/ + *IO-^ + *IO] 
u(t)~u(t-l) + ** _ . 

10 Finally, the notation means the future setpoint as postulated at time r. If the 

setpoint is preprogrammed, the future setpoint values w(/+£) can be used for w(t+k\t): 
w(t+k\t)=w(t+k), k=\...N 2 . The predictive control strategy will then take action in 
advance, before the actual setpoint change occurs. If this is not desired, then the current 
setpoint value is used for w(t+k]t) is: 

15 w(/+*|r)=w(/X for *=!...#,. 

THE NONLINEAR MIMO PREDICTIVE CONTROLLER 
The Method 

In this section the SISO principles discussed above are extended to MIMO 
systems. For simplicity a two input, two output system is discussed first. The extension 
20 to the ASMA application with four inputs and four outputs will then follow in a 
straightforward manner. 

With two inputs, and two outputs, the process model is now: 

*(0=*.«+«i(0 
*<0=* 2 (0 + «2(0 

25 where; 

*,(o=/;[*.('-i) xtO-nuti-i) "i«-3w-n «*-3>] 

xM^'l) x 2 (/-3WM) 
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As before, the functions /[...] andj^f...] are nonlinear unknown process models. 
In the SISO case, only one neural network was necessary, in the present case, with two 
outputs, two neural networks are necessary. 

Assuming a pair of white noise signals e, and e : , the stochastic disturbances are 
5 modeled by colored noise processes: 

C(q A ) C(q' x ) 

As with the SISO case, the objective is to find the control inputs and u 2 (t\t) 

which minimize the cost function 

+ A. £ \[Au,(t + k\tjf + [Au 2 (l + k\tjf\ 

where Au x (t+k\t)=0 and Au 2 (t+k\t)=0 for k>N Um 

For a 2x2 system, four step responses can be defined, describing the effect of a 
stepwise change of each of the two inputs on each of the two outputs. The coefficients 
of the step response of input j to output i are denoted by: {gf g\ g\ • • • } 

Introducing the usual matrix notation, the forced response \ny x (t+k\t) due to 
postulated future variations for both control inputs is: 



10 



15 







"ft" 


0 


0 


Au,(r|0 


>W*(' + 2 I') 




ft" 


ft" 


0 


Ak,(/ + 1|0 


















"ft" 


0 


0 


r A«j(/|f) 




+ 








Au,(f + l|r) 






A 


&n 3 -i 


12 

S\' 2 -iV u ^l _ 


_A« 2 (r+ ^ u -l|r) 
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A similar expression exists for y lforted 0 + *I0 

Denote the free response in y>{t+k\t) by + Setting all future input 

variations equal to 0, such that u y (t\t)=u x {t+ 1|*)= ..=«,(/- 1) and u 2 (f|/)=w 2 (/+l|0=-..=w 2 ('- 
1), gives: 



r,(t + N,\t)-y 9 (t + N 2 \0 m 



n(t + Ni\t)-y t (i + N,\0 



ii ii 



SA/, 6*1,-1 



Sn,-2 ■■■ Sn,-n u +\ 



.. T ^. wo 

A«,(/ + l|/) 



siU...l^(' + ".-ilO_ 
Au,(/+l|0 



AiijO + ^-llO. 



5 or using matrix notation: 

(R 1 -Y 1 ) = (R,-Y 1 )-G M U 1 -G 1I U I 

and similarly for the 2nd output: 

(R I -Y I ) = (R J -Y l )-G II U,-G 2I U,. 
With this compact notation, the cost function introduced above can be rewritten 

10 as: 

(R.-Y.^R.-YO+CR^Y^R^YO + MUfU. + UTlh) 
A compound matrix G, is defined as G,=[G M G 13 ], a compound matrix G, is defined as 
G 2 =[G 21 GJ and a compound vector U is defined as U=[U, T U 2 T ] T 

Using these compound values, the expressions for the predicted error vectors 
then become: 
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(R^YO-CR.-Y^-C-U 



(R 2 -Y,)-(R«-Y 2 )-G»-U 



and the cost function becomes: 

J = [(R 1 -Y 1 )-G 1 U] T [(R,-Y l )-G,-U] + 

[R 2 -Y 2 )-G 2 .U] T [(R 2 -Y 2 )-G 2 .U] + ^U T U 



dJ 



Minimization of this scalar expression with respect to the vector U (by settting 0 ) 

dU 

leads to the optimal solution: 

U* - [Gj G, + Gj G 2 + U]* ■ [Gj (R. - Y, ) + Gj (R , - Y 2 )] 
Note that even in the preferred case where N u =l, a matrix inversion is required, 
in this case a 2x2 matrix. In the general case, where n u is the number of control inputs, 
a matrix of dimension (N u n u )x(N u n u ) must be inverted. Only two elements in U* are 
used for applying the control at time t: 

U*(l) = Lu x (t\t) => u,(t) = Ul (t-l) + U*(l) 
U*(N U +1) = Au 2 (t\t) => u 7 (t) = u 2 (t-l) + U*(N u + l) 



10 Extension of the two input, two output case to four inputs (J-I..A) and four 

outputs (/=!... 4) is straightforward: 



Xgjg+xi 



I-/ 



Zg, t (r,-x) 



where: 



with* 



G, = [G„ G i2 G,j Gm], i-1...4 

u-[u, r u[urur] r 

u } (t) = uj(t-\) + U)(\). y = l... 4 



15 



The Algorithms 

At each sampling instant there are 16 step responses 
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for k = 0...JV, 



relating each of the four SCR inputs to each of the four susceptor temperature sensor 

outputs 44, 46, 48, and 50. The step responses are calculated by entering, for each input 

Uj\J—l..A, a step with size Sj in the four process models relating to the four neural nets, 

one for each output Xj, (i=1...4). 

Figure 21 is a flowchart illustrating the steps necessary to compute the step 

responses. The process begins at a loop control block 2101. In the loop control block 

2101, a loop counter n is set to the value 1, representing the first input. The process then 

advances to a process block 2102 where u,(t+k|t) is initialized as follows: 

uAt + k\t) = Ut{t-\) + S; 
u 2 (t + k\t) = u 2 (t-\) 
' u 3 (t + k\t) = u 3 (t-l) 

Processing then proceeds to a process block 2103 where the outputs of the neural 
network are computed, resulting in : 

' x\"(t + k\t), output of 1" NN 

x["(t + k\t), output of 2 nd NN 

x\ i] (t + k\t), output of 3 rd NN 

x l 4 l \t + k\t), output of 4* NN ; 

Processing then proceeds to a loop control block 2104 which increments the loop 
counter n to indicate the next input. Processing then returns to the process block 2102 
where UjCt+klt) is initialized as follows: 



) k = \...N, 



for k = 0...N J . 



Ul (t + k\t) = u,(t-l) 
u 2 (t + k\t) = u 2 (t-\) + S, 

« 3 (r + *|0 = " 3 ('-i) 
[u<(t + k\t) = u A (l-l) 

Processing then advances to the process block 2103 where the networks are used to 
calculate: 
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k = l...N, 



The above process is repeated until all of the inputs have been traversed by the loop 
counter n. When, in the process block 2104, the loop index n becomes larger than the 
number of neural networks, then the process proceeds to a process block 2105. In the 
process block 2105, set: 



for n = 1...4 k = 0...7V, 



u x {t+k)\t) = u,(t-\y 
u 2 (t + k)\t) = u 2 (t-l) 
u 3 (t + k)\t)=u 2 (t-\) 
u 4 (t + k)\t) = u 4 (t-l) 

and then proceed to a process block 2106. 

In the process block 2106, calculate with the four NN-models: 



x\°\t+k\t) 



x™{t + k\t) 



X \°\t + k\t) 



x\°\t + k\t) 



>k = L..N 2 



The responses {jc| 0, (?- + k\t) ... x [ 4 0] (t + k\t)} are the free responses of the neural 
networks and are used to calculate the system free responses y(t + k\tj y 
where y(t + k\t) = x m (t + k\t) + n(t + k\t) . Processing then proceeds to a process block 
2107 where the effect of a stepwise variation of an input, meaning the difference 
between the NN-output with a step input and the NN-output without a step input (the 
free response), is computed by: 

gf ={x l / ] (t + l\0-x) Oi (t + ]\t)]/Sj 
g'l = [x) n (t + 2\t) - x\ 0] (t + 2\tj] / Sj 



gl 2 = [x\ i] (t + NA0 - x^Ct + NA0] / S, 
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where i=1...4 denotes output number, j- 1... 4 denotes the input number and division by 
the step size Sj is necessary' to obtain the effect of a unit step. For nonlinear systems, the 
magnitude of the step sizes Sj y j=\..A should be chosen according to the real input 
variations Auj that are expect to apply to the specific system. For the ASMA 
5 application, an appropriate choice is S^S 2 -S y ^S 4 =l (as the range for the SCR inputs is 
(0 ... 5). 

Training the Neural Network 

Model Based Predictive Control (MBPC) is a control strategy which relies 
heavily on the availability of the model 1502. The preceding sections have largely 

10 assumed the existence of the model 1502, preferably based on a neural network 1600, 
without elaborating how the model is generated. This section begins with a brief 
discussion of the advantages of using a neural network 1600 as the basis for the model 
1502 and then describes how the model is generated. Since the model is based on a 
neural network 1600, generation of the model is largely a process of training of the 

15 neural network. Training the neural network corresponds to the training layer 1612 of 
Figure 14B, and requires the PLS training method 2300, the pulsetest experiment 1900, 
and the initial estimates 2400 shown in that Figure. 

Modeling of a physical system for control purposes requires the finding of a 
mathematical relationship (a model) between the system's inputs and outputs. For the 

20 ASMA application modeling entails construction of a mathematical model that 
describes the effect of the SCR-signals (the inputs) on the susceptor thermocouple 
signals 44, 46, 48, and 50 (the outputs). The model depends on the underlying physical 
characteristics of the process, which in this case, is mainly a thermal process. Instead of 
building a first principles model, starting from complicated physical-chemical laws, the 

25 preferred approach is to use a black box model (a neural network) and train that network 
using experimental data obtained from the reactor during an identification experiment. 

The obtained model should be quite general in that it should be valid for other 
experimental data than those used during the identification experiment, as long as the 
reactor is operating in similar conditions of temperature range and reactor configuration. 

30 If essential changes occur, the process will generally need to be re-modeled. The 
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modeling of a typical ASMA reactor takes less than 1 hour, including the required 
identification experiment. 

In a preferred embodiment, a Pseudo Least Squares (PLS) method is used to 
train the neural network 1600 as a nonlinear model for the ASMA reactor. The NN- 
model is then further used in the NEPco predictive control strategy as shown in Figure 
14B. 

The training procedure consists of the following general steps of: 

(1) performing an experiment with the reactor to obtain the modeling data, 
in the preferred embodiment this experiment is a pulsetest experiment 
1900; 

(2) training the neural network (NN) 1 600 using the data obtained from the 
pulsetest experiment 1900, in the preferred embodiment the NN model is 
trained using a pseudo least squares (PLS) method 2300; and 

(3) validation of the resulting model 

The pulsetest experiment 1900 and PLS method 2300 are described in detail 
below. In a preferred embodiment, the software necessary to perform the modeling 
tasks is implemented using MATLAB®. However, the preferred embodiment could be 
re-coded in other languages without difficulty. 
THE PULSETEST IDENTIFICATION EXPERIMENT 

In the preferred embodiment, the ASMA reactor is a system with four inputs 
(SCR-signals) and 4 outputs (thermocouple signals) as listed in Table II. 



The inputs are denoted as: 


The outputs are denoted as: 


u,(t): center SCR signal (0-5 V) 


y,(t): center thermocouple signal (°C) 


u 2 (t): front SCR signal (0-5V) 


y 2 (t): front thermocouple signal (°C) 


u 3 (t): side SCR signal (0-5V) 


y 3 (t): side thermocouple signal (°C) 


u 4 (t): rear SCR signal (0-5V) 


y 4 (t): rear thermocouple signal (°C) 



Table II. The four input ASMA reactor system 
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The reactor is computer-controlled and all signals are sampled on a discrete-time basis. 
The symbol t denotes the discrete-time index (1,2,3,...). Training the neural network 
1600 requires that a set of modeling coefficients {W m , b m , W [2J , b [21 } be generated. The 
modeling coefficients depend on a sample period, SamplePeriod. In the preferred 
5 embodiment, the SamplePeriod is 2 seconds. The numerical values in the model depend 
on this sampling period. This means that the control, which is based on this model, 
should also be executed with a sampling period of 2 seconds. The sample period can be 
changed without ill effect, but if the control sampling period is changed, remodeling to 
compute a new set of coefficients is prudent. 

10 A characteristic of the model is that each output {y, ... y 4 ) depends on all four 

inputs {w l ... u 4 }. In order to identify these relationships, it is necessary to do an 
experiment with the reactor in order to obtain useful identification data. A particularly 
preferred experiment is the pulsetest, which consists of sending consecutively a pulse in 
each SCR input and measuring each thermocouple reaction. In order to cover the entire 

15 nonlinear operating range of the reactor (e.g. 800°C to 1100°C), the test is repeated at 
several base values of the SCR inputs. A parameter Duration determines how many 
samples each pulse lasts. In a preferred embodiment, the Duration is five samples (10 
seconds). 

A parameter Base Values is a row vector containing one or more base values for 
20 the SCR inputs, in volts (V). Typical BaseValues are [0.8, 1.3, 2.0], corresponding 
approximately to reactor temperatures [800, 950, 1 100] (in °C). More than three base 
values can be used, leading to higher accuracy, however, this requires a correspondingly 
longer experiment. The pulses are executed successively for each base value. The time 
between two pulses, specified as a number of samples in a parameter Period, depends on 
25 the settling time of the reactor. For a common reactor, typical values for the parameter 
Period are between 60 and 120 samples. None of these parameter values are critical and 
wide variation in values will yield acceptable results. 

The duration of the pulsetest experiment \s N samples (2*N seconds), where 
N=Duration* Period* Nbase, where Nbase is the number of entries in the vector 
30 BaseValues. The result of the pulsetest experiment 1900 is a dataset containing all input 
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and output samples of the pulsetest experiment This dataset can be used by the 
modeling software to train the NN model. 
THE PSEUDO LEAST SQUARES NN TRAINING METHOD 
Mathematical Overview of he PLS Method 

The preferred embodiment of a feed forward neural network for temperature 
control, as shown in Figure 16, comprises: n inputs xj where .n; one hidden layer 
with m nonlinear sigmoid type neurons; and a linear output layer with one output^. The 
input layer is a layer of non-active neurons. The non-active neurons do not perform any 
computation, they only distribute the input signals to the neurons in the hidden layer. 
The hidden neurons have outputs z/ f where z=L..m; and i refers to a specific hidden 
neuron. The outputs z< are computed as follows: 



hidden neurons^ 



where; 

and; 
and; 



X X2 ••- Xj Xn] 



...the output 

j, ...a weighted, biased 
sum of the inputs 

...the inputs 
...the weights 



The parameters in the weight vectors W. 1 (i=l...m) and the biases b} ]] (i=l...m) 
are unknown and must be estimated from experimental data. The biases are desirable in 
1 5 order to compensate for the fact that the output v is not necessarily zero when the input x 
is zero. 

The output layer contains a single linear neuron. The output y is computed as 
follows: 
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output neuron 



where; 



and; 



i-/ 

Z = [zj Z2 • Zi ... z m ] r 
W l2| = [v^ 2 W 2 2 J... wf 23 ... wi, 2J ] 



Here again, training the NN involves estimating the weights W and biases b. 

For the estimation of all of these parameters, a set of training data from the 
pulsetest experiment is used. The data from the pulsetest experiment includes the 
experimental inputs X(k), and the corresponding outputs T(k); £=1...N. Thus, T(k) are 
5 target values and N is the number of samples. The training of the NN consists of 
estimating a set of parameters W} x \b) n W C2] , and b {2] where r=l ... m and such that, 
given a set of inputs X(£), the outputs^;, *=1...N are as close as possible to the target 
values T(k), £=1...N. 

The phrase "as close as possible" is generally quantified by a Sum of Squared 

1 0 Errors (SSE) value V given by: 



The NN herein is nonlinear, and thus no closed form method is currently known for 
estimating W/ m , b, J,) , W 121 and b 121 . However, a heuristic training method, called Pseudo 
Least Squares (PLS), has been found to work well in this application. 
15 The PLS method has the advantages of simplicity, ease of programming and fast 

training speed. The PLS method, described in more detail below, involves finding an 
initial set of estimates, and then using an iterative procedure to refine the initial 
estimates. Briefly, the iterative procedure involves starting at the hidden layer of 
neurons and working forward, thought the NN, towards the output neuron, refining the 
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parameters W and b for each layer. The following sections herein present the PLS 
method and a procedure for implementing the method. 
PLS Estimation of the Output Laver Parameters 

The parameters {W {2 \b [2] } of the output layer are estimated in order to minimize 
5 the SSE loss value V: 



All other network parameters {W} n ,b) n ;i = L..m) are assumed to be known at time r. 
Minimization is obtained by setting the derivatives of V(W [2 \b [2] ) with respect to 
{W [2 \b {2] } equal to zero: 




dV(W\b m ) 



= 0 and 



dVCW v '\b {2] ) 
db w 



= 0 



10 For ease of notation, two extended vectors W m = [w'" 1 b [2] ] and Z 



are defined. 



Then the output y can be written in terms of the extended vectors as: 



y(k) = V/^-Z(k) + b li) = W lJ1 -Z(*) 



and thus the two conditions above can be combined as 



dV(W w ) 
dW l2 > 



= 0 



(34) 



leading to 
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With 



this gives: 



£[77*; -w [2 ' -z(k)\z T (k) = o 



k-J 



A least squares solution to the above equation is: 



£r(*)-z r (*) 



Zz(A:)Z r (A) 



PLS Estimation of the Hidden Laver Parameters 
5 The parameters wS n f and b\ l] of neuron / (i=L.m) in the hidden layer are 

estimated in order to minimize the SSE loss function: 



10 



All other network parameters 

FFV'.M"; ... / ^\M; .- ; WW-'; are assumed to be known. 

Minimization is obtained by putting the derivatives of V(W\ n M") with respect to 
{W\'Kb\ n } equal to zero, such that: 
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tt; = 0 ana 77771 = 0 

d\\\ l] db\ X] 



For ease of notation, define two extended vectors w| n = [W; n b\* ] ] and 

x= 



are defined. Then, 

n,(k) = W\ l] X(k) + b\ i] = w["X(Jk). 



The condition 



gives 



Using the chain rule for differentiation, the above derivative is found to be: 



leading to the nonlinear estimator equations 

f\T(k) - y(k)}^ ■ ,'[*, (k)\\ T (k) = 0 

Now introducing a back-propagated error term 5, , defined as: 
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results in 

k-l 

Now introduce a minimum back-propagation error: 



max(|5, (k) \; k = l... N) 

where e is a small number (e.g., e=10^). This guarantees that each h](k); k = \...N is a 
small number. The equation for the estimator then becomes: 



*-/ 

or 



Figure 22 illustrates the sigmoid function. Figure 22 shows the sigmoid function 
10 plotted on an X axis 2201 ranging from -3 to 3, and a Y axis 2202 ranging from - 1 to 1. 
A neuron input n 2203 and corresponding neuron output z 2206 are shown on the X axis 
2201 and Y axis 2202 respectively. Slightly displaced from the neuron input n 2203 
and corresponding output z 2206 are a fictitious neuron input n 2204 and a 
corresponding fictitious neuron output z 2205. 
15 The neuron output z 2206 corresponds to fictitious neuron output z" 2205 

according to the relationship z](k) = z { (k) + b\(k). Thus n](k) is such that 

z](k) = s[n](k)]. 

Note that given z* it is easy to compute n* as: 



1 (l+z*\ 
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Since the difference z *-z =5* is very small, it can be stated, with arbitrary accuracy, that: 



• -z 



= s'(n) or &* = s'(n).(n*-n). 



The estimator equations thus become 



£ ft)] ■ fa* ft) - n, ft>] 'X T (k) = 0 



and with m(k) =^ n -Xft; thus leading to the least squares solution 



^X(k) ■ s'fn, (7c) J X T (k) 



10 



The PLS Procedure 

In this section, divorced from the theoretical development above, is a summary 
of the PLS method to estimate the vectors W and b. Figure 23 is a flowchart illustrating 
the PLS procedure. The PLS method does requires an initial estimate for each of the 
vectors. Since there are many methods that can be employed to develop the initial 
estimates, the process of developing the estimates is not, strictly speaking, a part of the 
PLS method. Therefore, the PLS method presented here merely assumes that an initial 
estimate is available. A preferred method for developing the initial estimates is 
described below. 

In a process block 2301, compute a suitable starting set of initial estimates {W/ 11 , 
b .m f wm b m } 
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t 


and b lu = 










A". 



Proceeding to a process block 2302, compute an initial hidden neuron input vector N(k), 
for£=l...N from: 

N(k) = [ni(k)... m(k)... n m (k)] T = -X(k) + b {l] 

and an initial hidden neuron output vector Z(k), for k=l ...N from: 

Z(k) = \ Zl (k) . .. z, ftj .. . ftj] T = ft>] - 4«. ft>] - • s[n„ ft>]] T 

and an initial neural network output y{k) for &=1...N; 
5 and finally, a current SSE loss value V M from: 

y 0 u = T[m)-y(k)Y 



10 



Proceeding to a process block 2303, for each hidden neuron (/=l...m), compute 
the following items: 

• the derivative s'[n,(k)] = 1 - s[rii(k)] 2 = 1 - z,(k ) 2 for k = 1... iV ; 
• the back propagation error: 

5^ = [TO-#)]^ ! ^'KW] for* - /... N 
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• the scaled value: 

• the fictitious input and output: 

( 1 + 2,00 

ziOO-zM + biM and n s (k) = °- 5 * Xo \ 1 -z'.(k) ) ' 

• new weights and biases for neuron i from: 



Y.n,(k)'S'[n t (k)y^(k) 



k-i JL*-' 



• the corresponding new neuron input: 

= W/"-XW + #" for k = l...N 

• the corresponding new neuron output: 

10 • the new network output: 

y(k) = W i2] -Z(k) + b m fork = l...N 

• and a corresponding new SSE value V,^: 

V^=iL[T(k)-y(k)? 



k~l 



Proceeding to a decision block 2307, if V ncw is smaller than F oJd , then proceed to 
15 a process block 2308, otherwise, jump to a process block 2309. In the process block 
2308 replace the old values of W m , b [l \ y{k) and F okj with the new values ofW 1 ' 1 , b [X \ 
y{k) y and V new . Then proceed to the process block 2309. 

In the process block 2309, for the output neuron, compute s 1 [«/(*)], 5 /(/:), 5/(fc), 
zf(k) using W m , and b (n . Also in the process block 2309, for the output neuron, 
20 compute , and b] 1 ^ and use them to compute zfjc) and v new (A) and V ncw In the 
process block 23 1 3, the new weights and bias for the output neuron are given by: 
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£t(*)-z t (*)|2;z(*).z t (*) 

L*=i JL*-> 



where Z T (k) = [ Z T (k) 1] 
and the new network output is given by: 

y (k) = W l2, .Z^ + 6 !21 for * = 

5 Then proceed to a decision block 2313. In the decision block 2313, if is 

less than V oM then proceed to a process block 2314, otherwise jump to a decision block 
2315. In the process block 2314 replace the old values of W™, b [2 \ y{k) and V M with the 
new values of W™, b [2 \ y(k) y and V ncw . 

In the decision block 2315, if the value of has not stopped changing or 
10 reached some specified small value then processing returns to the process block 2302 
for another iteration, otherwise, the process advanced to. an end block 2316 and 
terminates. 

The result of the procedure in Figure 23 is a new set of parameters 
[fF f,, .d ,,1 ,FF i2 \Z> 121 ] and related network internal variables {N(k), Z(k)} and output 

15 values {y(k) y V}. As indicated in decision block 2315, The whole procedure can be 
repeated a number of times until the decrease of V is zero or less than a specified small 
value. As is always the case with nonlinear search procedures, the choice of a good set 
of initial values is of utmost importance in order to reduce the number of iterations and 
to prevent getting stuck in local minima. 

20 Initialization 

A preferred approach to the initialization problem is to start from the parameters 
of the linear model: 

y(k) = £ ^(k) +b = W • X(A) + b 
(1) Compute the parameters W = [W b] by minimizing the SSE loss V, where: 
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V(V) = tjre\k) = \f,[T(k) - y(k)f 



leading to the least squares solution 



W= 



Y^(k) T -X(k) 



(2) Select m positive random numbers {a,, a;, a m ) such that 



0.1 



max(\y(k)\);k = \...N) 



Set W (11 =a;.W and: 

■ w' 2 ' = «E*mL±Zl^n n ... , ... / o] 
This selection assures that each hidden neuron input, being given by 

fii(k) =W/ ,] -X= a x ,W X= a, 

lies between -0.1 and +0.1, so that the values are in the linear zone around 0 on 
sigmoid curve, thus: 



The neural net output for this choice of initial values, being given by: 
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2, vv, ;.Zi(k) + b =2- TTt -aiy(k) + 0 = y(k) 

will thus be close to the linear model output, which is a reasonable start condition. 

The SoftSensor Embodiment 

5 In yet another embodiment of model based-predictive controllers, the linear and 

non-linear models disclosed above can be further enhanced by adding a softsensor 
model to the basic MBPC fabrication system 1400. 

The temperature of the wafer surface is of major importance for the deposition 
process. However, the point-to-point wafer temperature is not measured during normal 
10 operation. Experiments have indicated that the susceptor temperatures give a reasonable 
approximation of the unknown wafer temperature distribution. There are also 
experimental results which indicate that good susceptor control alone is not sufficient to 
obtain very tight wafer control. 

Temperature transients (ramp-up/ramp-down) are typical situations in which 
15 wafer and susceptor temperatures might differ considerably. This is due to the different 
mass (heat capacity) of susceptor and wafer. Good susceptor control with no (or very 
low) temperature overshoot does not necessary lead to wafer control with low 
overshoot. Moreover the front 46, side 48 and rear 50 susceptor setpoints require the 
specification of an offset with respect to the center 44 susceptor setpoint in order to 
20 result in a good temperature uniformity over the wafer surface. In the prior art, these 
offsets are found by trial and error. 

The more more systematic method and apparatus presented here, which solves 
the above problems, is the use of MBPC combined with the softsensor principle. The 
concept is that the unmeasured wafer temperature can be replaced by the outcome of a 
25 model describing the dynamic relationship between susceptor and wafer temperatures. 
In the preferred embodiment, this softsensor model is identified using data obtained 
from experiments with an instrumented wafer. 
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Figure 24 is a block diagram that illustrates an extension of the basic fabrication 
system 1400 to a softsensor fabrication system 2400. A recipe block 2401 provides 
input into a setpoint generator block 2410. An output of the setpoint generator block 
provides input to a MBPC process block 2402 and a softsensor process block 241 2. An 
5 output of the softsensor process block 2412 is a wafer estimate 2414. The output of the 
wafer estimate 2414 is fed back into the setpoint generator block 2410. The MBPC 
process block 2402 outputs control signals to a reactor and lamp system 2404. A group 
of immeasurable outputs from the reactor process block 2404 are the wafer surface 
temperatures 2405. A group of measurable outputs from the reactor process block 2404 
10 are the susceptor temperatures 2406. The susceptor temperatures are fed back into the 
MBPC process block 2402 to facilitate temperature control of the wafer 22 and the 
susceptor 24. 

The recipe 2501 is used as setpoint for the susceptor temperature. Then, in the 
basic control structure, the recipe is interpreted as setpoint for the wafer temperature. 

15 The setpoints for the susceptor control are then computed internally in the control 
strategy, using the softsensor principle. 

A model, describing the dynamic relationship between susceptor setpoints and 
wafer temperatures, is identified using an instrumented wafer. The instrumented wafer 
is a special wafer which has temperature sensors on the surface of the wafer 20. This 

20 allows actual wafer surface temperatures to be measured. These measured values are 
used to obtain modeling coefficients for the softsensor process block 2412. During 
normal operation of the reactor, the softsensor process block 2412, being a part of the 
control software, can be used to generate an estimate of the wafer temperature. 

An inverse softsensor model is then used to generate intermediate signals, which 

25 are further used as setpoints for the standard susceptor controller. In a preferred 
embodiment, the setpint generator 2410 is a PID filter and the softsensor block 2414 is a 
linear FIR filter. 

The result is that the wafer temperatures, and not the susceptor temperatures, are 
controlled towards the values specified in the recipe. This procedure also computes, 
30 automatically, the necessary offsets for center 44, front 46, side 48 and rear 50 susceptor 
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setpoints in order to bring all wafer temperatures close to the recipe. This leads to better 
uniformity of the temperatures over the wafer surface. 

Conclusion 

While the present invention has been particularly shown and described with 
5 reference to preferred embodiments thereof, it will be understood by those skilled in the 
art that various changes in form and detail may be made therein without departing from 
the spirit, scope and teaching of the invention. Accordingly, the embodiments herein 
disclosed are to be considered merely as illustrative and limited in scope only as 
specified in the appended claims. 

10 
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WE CLAIM : 

1 . A temperature controlled thermal process reactor comprising; 
a reaction chamber enclosing an object to be heated; 

a source of thermal energy; 
5 a thermal sensor; and 

a model-based predictive temperature controller. 

2. The temperature controlled thermal process reactor of Claim 1, wherein 
the model-based predictive temperature controller comprises multivariable temperature 
control. 

10 3. The temperature controlled rapid thermal process reactor of Claim 2, 

wherein the model-based predictive temperature controller comprises: 

a multivariable thermal process model which relates multivariable 
process input thermal energy to multivariable process output temperature; 

a prediction calculator which uses said thermal process model, to 
15 calculate a predicted nominal temperature output over a predetermined future 

time period; and 

a control calculator which uses said predicted nominal temperature 
output to calculate an optimum control strategy by which to control said source 
o f thermal energy. 

20 4. The temperature controlled thermal process reactor of Claim 3, wherein 

said prediction calculator calculates the predicted nominal temperature output using an 
auto-regressive moving average, having a predetermined prediction horizon. 

5. The temperature controlled thermal process reactor of Claim 4, wherein 
the prediction calculator assumes a future control strategy. 

25 6. The temperature controlled thermal process reactor of Claim 5, wherein 

said predicted nominal temperature output is calculated recursively over a 
predetermined future time period. 

7. The temperature controlled thermal process reactor of Claim 6, wherein 
said thermal process model substantially decouples the influence of system input 

30 variables from system input disturbance. 
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8. The temperature controlled thermal process reactor of Claim 3, wherein 
the control calculator compares said predicted nominal temperature output to a desired 
future temperature output to derive said optimum control strategy. 

9. The temperature controlled thermal process reactor of Claim 3, wherein 
5 said thermal process model is a nonlinear model 

10. The temperature controlled thermal process reactor of Claim 3, wherein 
said thermal process model is based on a neural network. 

1 1 . The temperature controlled thermal process reactor of Claim 1 , wherein 
the model-based predictive temperature controller comprises nonlinear multivariate 

1 0 temperature control. 

12. The temperature controlled rapid thermal process reactor of Claim 1 1, 
wherein the nonlinear model-based predictive temperature controller comprises: 

a nonlinear multivariate thermal process model which relates 
multivariate process input thermal energy to multivariate process output 
15 temperature; 

a prediction calculator which uses said thermal process model, to 
calculate a predicted nominal temperature output over a predetermined future 
time period; and 

a control calculator which uses said predicted nominal temperature 
20 output to calculate an optimum control strategy by which to control said source 

of thermal energy. 

13. The temperature controlled thermal process reactor of Claim 12, wherein 
said prediction calculator calculates the predicted nominal temperature output using a 
neural network. 

25 1 4. The temperature controlled thermal process reactor of Claim 1 3, wherein 

the prediction calculator assumes a future control strategy. 

1 5. The temperature controlled thermal process reactor of Claim 1 4, wherein 
said neural network is a feed forward network. 

1 6. The temperature controlled thermal process reactor of Claim 1 5, wherein 
30 said neural network comprises a hidden layer of neurons. 



74 



~ «^o^n PCTAJS97/01318 
WO 97/28669 

1 7. The temperature controlled thermal process reactor of Claim 1 6, wherein 
said hidden layer of neurons comprises nonlinear sigmoid-type neurons. 

1 8. The temperature controlled thermal process reactor of Claim 13, wherein 
said neural network is trained using a pseudo least squares method. 

5 19. The temperature controlled thermal process reactor of Claim 12, wherein 

the control calculator compares said predicted nominal temperature output to a desired 
future temperature output to derive said optimum control strategy. 

20. The temperature controlled thermal process reactor of Claim 1, further 
comprising a softsensor model. 
10 21. The temperature controlled thermal process reactor of Claim 20, wherein 

said softsensor model is created from a dataset generated by using an instrumented 
wafer. 

22. The temperature controlled thermal process reactor of Claim 1, further 
comprising a setpoint generator, said setpoint generator automatically generating a 

15 correction to said recipe inputs into said thermal process reactor, said correction 
facilitating control of actual wafer surface temperatures. 

23. The temperature controlled thermal process reactor of Claim 22, said 
correction facilitating improved control of actual wafer surface temperatures based on 
measurement of susceptor temperatures. 

20 24. A temperature control system for controlling a thermal process 

comprising; 

a controllable source of thermal energy; 
a temperature sensor; and 

a model-based predictive temperature controller comprising: 
25 a thermal process model which relates process input thermal 

energy to a process output temperature; 

a prediction calculator which uses said thermal process model to 
calculate a predicted nominal temperature output over a predetermined 
future time period; and 

30 a control calculator which uses said predicted nominal 

temperature output to calculate an optimum strategy by which to control 
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said source of thermal energy, said controller generating output signals to 
said source of thermal energy in response to said optimum strategy. 

25. The temperature control system of Claim 24, wherein said thermal 
process model substantially decouples the influence of system input variables from 

5 system input disturbances. 

26. The temperature control system of Claim 24, wherein said prediction 
calculator includes a postulated future control strategy. 

27. The temperature control system of Claim 24, wherein said thermal 
process model is a nonlinear model. 

10 28. The temperature control system of Claim 27, wherein said thermal 

process model substantially decouples the influence of system input variables from 

system input disturbances. 

29. The temperature control system of Claim 27, wherein said prediction 

calculator comprises a neural network. 
15 30. The temperature control system of Claim 27, wherein said prediction 

calculator comprises a feed forward neural network, said prediction calculator having a 

receding calculation horizon. 

31. A method of controlling a thermal process comprising the steps of: 
measuring a process output temperature; 

20 using a model to predict a future process output temperature; 

calculating an optimum process input control strategy; and 
controlling a process input thermal energy using the calculated optimum 
process input control strategy. 

32. The method of Claim 31, wherein the step of predicting a future process 
25 output temperature comprises: 

identifying a thermal process model which relates process input thermal 
energy to process output temperature; and 

recursively predicting future process output temperatures using said 
thermal process model, said process output temperature predicted over a 
30 predetermined future time period. 
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33. The method of Claim 32, wherein the step of predicting future process 
output temperatures further comprises periodically updating said predictions in 
accordance with a receding horizon calculation. 

34. The method of Claim 31, wherein the step of predicting a future process 
5 output temperature comprises postulating a stationary future control strategy. 

35. The method of Claim 31, wherein the step of calculating an optimum 
process input control strategy comprises comparing said predicted future process output 
temperatures to a desired future process output temperature. 

36. The method of Claim 31, wherein the step of predicting a future process 
1 0 output temperature comprises: 

identifying a nonlinear thermal process model which relates process 
input thermal energy to process output temperature; and 

training a neural network to predict future process output temperatures 
using said thermal process model, said process output temperature predicted 
1 5 over a predetermined future time period. 

37. The method of Claim 36, wherein the step of predicting future process 
output temperatures further comprises periodically updating said predictions in 
accordance with a receding horizon calculation. 

38. The method of Claim 36, wherein the step of predicting a future process 
20 output temperature comprises postulating a stationary future control strategy. 

39. The method of Claim 36, wherein the step of calculating an optimum 
process input control strategy comprises comparing said predicted future process output 
temperatures to a desired future process output temperature. 



77 



WO 97/28669 PCT/US97/01318 

AMENDED CLAIMS 

(received by the International Bureau on 3 I July 1997 (3 1 .07.97); 
original claims 1, 5-8. 24-26 and 3 1 amended; remaining claims unchanged (5 pages)] 

1 . A temperature controlled thermal process reactor comprising; 
a reaction chamber enclosing an object to be heated; 

a source of thermal energy which heats said object; 
5 a thermal sensor which measures a temperature related to a temperature 

of said object and which provides an input signal representative of said 
temperature; and 

a model-based predictive temperature controller which receives said 
output signal representative of said temperature and which controls said source 
10 of thermal energy in response to said output signal. 

2. The temperature controlled thermal process reactor of Claim 1, wherein 
the model-based predictive temperature controller comprises multivariable temperature 
control. 

3. The temperature controlled rapid thermal process reactor of Claim 2, 
1 5 wherein the model-based predictive temperature cortroller comprises: 

a multivariable thermal process model which relates multivariable 
process input thermal energy to multivariable process output temperature; 

a prediction calculator which uses said thermal process model, to 
calculate a predicted nominal temperature output over a predetermined future 
20 time period; and 

a control calculator which uses said predicted nominal temperature 
output to calculate an optimum control strategy by which to control said source 
of thermal energy. 

4. The temperature controlled thermal process reactor of Claim 3, wherein 
25 said prediction calculator calculates the predicted nominal temperature output using an 

auto-regressive moving average, having a predetermined prediction horizon. 

5. The temperature controlled thermal process reactor of Claim 4, wherein 
the prediction calculator calculates an unoptimized initial estimate for a future control 
strategy. 

30 6. The temperature controlled thermal process reactor of Claim 5, wherein 

said predicted nominal temperature output is calculated recursively over a 
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predetermined future time period using a recursive approximation strategy, said 
recursive approximation strategy beginning with said unoptimized initial estimate. 

7. The temperature controlled thermal process reactor of Claim 6, wherein 
said thermal process model has parameters selected to substantially decouple the 

5 influence of system input variables from system input disturbance. 

8. The temperature controlled thermal process reactor of Claim 3, wherein 
the control calculator compares said predicted nominal temperature output to a desired 
future temperature output and uses said comparison in a recursive algorithm to compute 
said optimum control strategy. 

10 9. The temperature controlled thermal process reactor of Claim 3, wherein 

said thermal process model is a nonlinear model. 

10. The temperature controlled thermal process reactor of Claim 3, wherein 
said thermal process model is based on a neural network. 

11. The temperature controlled thermal process reactor of Claim 1, wherein 
15 the model-based predictive temperature controller comprises nonlinear multivariable 

temperature control. 

12. The temperature controlled rapid thermal process reactor of Claim 11, 
wherein the nonlinear model-based predictive temperature controller comprises: 

a nonlinear multivariable thermal process model which relates 
20 multivariable process input thermal energy to multivariable process output 

temperature; 

a prediction calculator which uses said thermal process model, to 
calculate a predicted nominal temperature output over a predetermined future 
time period; and 

25 a control calculator which uses said predicted nominal temperature 

output to calculate an optimum control strategy by which to control said source 
of thermal energy. 

13. The temperature controlled thermal process reactor of Claim 12, wherein 
said prediction calculator calculates the predicted nominal temperature output using a 

30 neural network. 

14. The temperature controlled thermal process reactor of Claim 13, wherein 
the prediction calculator assumes a future control strategy. 
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15. The temperature controlled thermal process reactor of Claim 14, wherein 
said neural network is a feed forward network. 

16. The temperature controlled thermal process reactor of Claim 15, wherein 
said neural network comprises a hidden layer of neurons. 

5 17. The temperature controlled thermal process reactor of Claim 16, wherein 

said hidden layer of neurons comprises nonlinear sigmoid-type neurons. 

18. The temperature controlled thermal process reactor of Claim 13, wherein 
said neural network is trained using a pseudo least squares method. 

19. The temperature controlled thermal process reactor of Claim 12, wherein 
10 the control calculator compares said predicted nominal temperature output to a desired 

future temperature output to derive said optimum control strategy. 

20. The temperature controlled thermal process reactor of Claim 1, further 
comprising a softsensor model. 

21 . The temperature controlled thermal process reactor of Claim 20, wherein 
15 said softsensor model is created from a dataset generated by using an instrumented 

wafer. 

22. The temperature controlled thermal process reactor of Claim 1, further 
comprising a setpoint generator, said setpoint generator automatically generating a 
correction to said recipe inputs into said thermal process reactor, said correction 

20 facilitating control of actual wafer surface temperatures. 

23. The temperature controlled thermal process reactor of Claim 22, said 
correction facilitating improved control of actual wafer surface temperatures based on 
measurement of susceptor temperatures. 

24. A temperature control system for controlling a thermal process 
25 comprising; 

a controllable source of thermal energy which heats an object; 

a temperature sensor which measures a temperature related to a 
temperature of said object and which generates an output signal responsive to 
said temperature; and 

30 a model-based predictive temperature controller which receives said 

output signal representative of said temperature and which controls said source 
of thermal energy in response to said output signal, said controller comprising: 
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a thermal process model which relates process input thermal 
energy to a process output temperature; 

a prediction calculator which uses said thermal process model to 
calculate a predicted nominal temperature output over a predetermined 
5 future time period; and 

a control calculator which uses said predicted nominal 
temperature output to calculate an optimum strategy by which to control 
said source of thermal energy, said controller generating output signals to 
said source of thermal energy in response to said optimum strategy. 
10 25. The temperature control system of Claim 24, wherein said thermal 

process model has parameters selected to substantially decouple the influence of system 
input variables from system input disturbances. 

26. The temperature control system of Claim 24, wherein said prediction 
calculator includes a postulated future control strategy and a recursive algorithm to 

15 optimize said postulated future control strategy. 

27. The temperature control system of Claim 24, wherein said thermal 
process model is a nonlinear model. 

28. The temperature control system of Claim 27, wherein said thermal 
process model substantially decouples the influence of system input variables from 

20 system input disturbances. 

29. The temperature control system of Claim 27, wherein said prediction 
calculator comprises a neural network. 

30. The temperature control system of Claim 27, wherein said prediction 
calculator comprises a feed forward neural network, said prediction calculator having a 

25 receding calculation horizon. 

31. A method of controlling a thermal process comprising the steps of: 
measuring a process output temperature; 

using a model to predict a future process output temperature; 
using said measured process output temperatures and said predicted 
30 future process temperature to calculate an optimum process input control 

strategy; and 
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controlling a process input thermal energy using the calculated optimum 
process input control strategy. 

32. The method of Claim 31, wherein the step of predicting a future process 
output temperature comprises: 
5 identifying a thermal process model which relates process input thermal 

energy to process output temperature; and 

recursively predicting future process output temperatures using said 
thermal process model, said process output temperature predicted over a 
predetermined future time period. 
10 33. The method of Claim 32, wherein the step of predicting future process 

output temperatures further comprises periodically updating said predictions in 
accordance with a receding horizon calculation. 

34. The method of Claim 31, wherein the step of predicting a future process 
output temperature comprises postulating a stationary future control strategy. 
15 35. The method of Claim 31, wherein the step of calculating an optimum 

process input control strategy comprises comparing said predicted future process output 
temperatures to a desired future process output temperature. 

36. The method of Claim 31, wherein the step of predicting a future process 
output temperature comprises: 
20 identifying a nonlinear thermal process model which relates process 

input thermal energy to process output temperature; and 

training a neural network to predict future process output temperatures 
using said thermal process model, said process output temperature predicted 
over a predetermined future time period. 
25 37. The method of Claim 36, wherein the step of predicting future process 

output temperatures further comprises periodically updating said predictions in 
accordance with a receding horizon calculation. 

38. The method of Claim 36, wherein the step of predicting a future process 
output temperature comprises postulating a stationary future control strategy. 
30 39. The method of Claim 36, wherein the step of calculating an optimum 

process input control strategy comprises comparing said predicted future process output 
temperatures to a desired future process output temperature. 
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